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PREFACE TO THE HANDBOOK 

The 1978 amendments- to Title VII of the Elementary and Secondary 
Education Act (ESEA) of 1968 mandates the Secretary of Education to 
develop and publish models to evaluate Title VII bilingual education 
projects with respect to the progress made by project participants in 
attaining English language skills (Section 731(d)(2)). Section 
731(e)(3) of the Act also mandates the Secretary to develop evaluation 
and data-gathering models that consider the linguistic and cultural 
differences of bilingual children; the availability and operation of 
State bilingual programs; and variables relevant to describing Title 
VII projects, such as length of the program, hours of instruction, and 
qualifications of the teachers. Section 721 (b) ( 3) (C ) (iii ) also 
requires that each basic grant include a plan for program evaluation. 

In response to the mandate, the Department of Education initiated an' 
undertaking entitled, "A Project for Developing Program Evaluation and 
Data Gathering Models for ESEA Title VII Bilingual Education 
Programs. 11 The efforts of this activity produced the Handbook for 
Evaluating ESEA Title VII Bilingual Education Programs . The Handbook, 
designed primarily for program directors and evaluato^^ bilingual 
programs, is comprised of three volumes: a User's Guid e, a Designer's 
Manual , and a Technical Appendix . The Handbook meets the requirements 
of the Act and provides basic* guidelines for conducting evaluations of 
Title VII bilingual education programs. 

» 
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How the Handbook was Developed 

The three-volume series is intended to provide program directors and 
evalu&tors with practical guidelines and recommended approaches' for 
determining what should be included in an evaluation and how to 
conduct an evaluation. The Handbook may also serve as a reference 
guide for other persons associated with the bilingual program, such as 
teachers and parents. 

Two major activities were undertaken in developing the Handbook. 
First, information describing current evaluation practices and data 
gathering activities being conducted by Title VII bilingual programs 
was collected from programs identified by State Education Agencies 
(SEAs) and Local Education Agencies (LEAs). As a result of the 
information collected, parameters of evaluation issues such as student 
needs, languages served, program settings and designs, and the costs 
associated with an evaluation were formulated. The second activity 
reviewed the literature on evaluation methodology related to bilingual 
education and determined the potential utility of current evaluation 
theory and practices, as reported in the literature, to the evaluation 
of basic classroom bilingual programs funded under Title VII of the 
ESEA of 1968, as amended. Information collected from both of these 
activities was then utilized in developing the Handbook. 



The Handbook is Organized Into Three Volumes 



Volume I, A User's. Guide to Evaluation Basics is intended to provide 
the planners with an overview of evaluation issues and a summary of 
procedures required to per form an evaluation. The guide presents 
summary information corresponding to the more detailed evaluation 
procedures presented in the Designer's Manual (Volume II). 

Volume II, the Designer 1 s Manual , is designed for the individual(s) 
actually conducting the evaluation, and contains guidelines; forms, 
and worksheets necessary to conduct an effective evaluation. The 
manual consists" of five chapters addressing the following evaluation 
activities: 

o Planning, managing, and staffing the evaluation; 

o Establishing baseline data required for the 
evaluation; 

o Monitoring program operation; 

o Evaluating student outcomes; and 

o Preparing the evaluation report. 

Volume III, The Technical Appendix is a collection of technical 
articles including topics of interest such as characteristics of 
specific tests, explanations of key issues in evaluation, theoretical 
justifications of evaluation procedures that cannot be found -easily in- 
the literature, as well as full-size copies of the various evaluation 
worksheets* 

3 " 
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How to Use the Three Volumes Effectively 

To benefit fully from the Handbook, the user is encouraged to read the 
User's Guide in its entirety. This will provide the user with a 
comprehensive overview of the entire evaluation process. Volume II 
will then direct and recommend specific actions, activities, and steps 
to be used in conducting the evaluation. If followed correctly, 
Volume II will provide the user with a systematic approach to design 
and conduct the evaluation. The technical information which covers 
different evaluation issues or methods presented in Volume III. may be 
used by the user as reference material. 

The special needs and goals of bilingual education programs require 
educators and administrators to continually examine the 
appropriateness and effectiveness of the program. This challenge can 
be aided through the careful planning and conducting of an evaluation 
designed to meet the requirements of the funding agency, to enhance 
the program's management and operations, and to provide useful 
information for the program administrators to use in improving the 
program. 




ERIC 



VOLUME I 



A 

USER'S GUIDE TO EVALUATION BASICS 



FOR 



EVALUATING ESEA TITLE VII 
BILINGUAL EDUCATION PROGRAMS 



Prepared By: 
INTERAMERICA RESEARCH ASSOCIATES, INC. 
1555 Wilson Boulevard 
Rosslyn, Virginia 22209 



MAY 1982 



i 



This document was prepared for the U.S. Department of Education by 
InterAmerica Research Assocrafees, Inc. under Contract ED-30Q-8Q-Q598, 
funded through the ESEA Title 'V$I Part C Research Agenda for Bilingual 
Education. Contractors undertaking such projects are encouraged to 
express their judgment freely in professional and technical matters. 
Therefore, the statements, findings, conclusions, and recommendations, 
herein, do not necessarily reflect the views of the U.S. Department of 
'Education. 



1; 



ERIC 



TABLE OF CONTENTS 
Volume I 

Overview HI 

Conceptual Framework for the Evaluation 1 

CHAPTER It PLANNING THE EVALUATION 1-1 

1. Select an Evaluator and Assign Responsibilities 1-3 

2. Determine the Audiences and What to Evaluate 1-7 

3. Set Priorities and Establish Timelines I- 9 

4. Determine ,Level of Effort, Budget and Allocate 

Resources „. 1-11 

5. Plan the Data Analysis Function I~12 

6. Plan the Data Interpretation and Development of 

Recommendations 1-13 

7. Plan the Reporting of the Evaluation 1-14 

CHAPTER II: ESTABLISHING BASELINE DATA REQUIRE D 

FOR THE EVALUATION 1-16 

1. Describe the Context of the Program . 1-18 

2. . Describe the Students !-l 9 

3. Describe the Program boals 1-20 

4. Describe the Instructional Program 1-22 

5. Document and Report the Baseline Data 1-24 

CHAPTER III: CONDUCTING THE EVALUATION OF PROGRAM 

0Fgj»T10NS — — 1-25 

1. Evaluating Program Instruction 1-28 

2. Evaluating Staff Development 1-32 

3. Evaluating Parent Involvement 1-35 

4. Reporting the Evaluation Results 1-36 

CHAPTER IV: CONDUCTING THE EVALUATION OF STUDENT 

OUTCOMES _____ x _ 37 

1. Developing the Evaluation Design 1-42 

2. Evaluating the English Language Component 1-46 

3. Evaluating the Non-English Language Component 1-52 

4. Evaluating Student Performance in Academic Areas 1-56 

5. Evaluating Affective Areas of Student Performance 1-59 

6. Conducting the Data Collection Activity 1-61 

7. Analyzing Student Outcome Data !- 6 3 

8. Interpreting the Results of the Evaluation 1-67 

CHAPTER V: "PREPARING THE^ EVALUATION REPORT 1-6? 



1 O 



OVERVIEW 

This document represents the* first of a three-volume series 
constituting the Handbook for Evaluating ESEA Title VII Bilingual 
Education Programs . The Handbook provides practical guidelines and 
recommended approaches for bilingual education program directors and 
evaluators to use in evaluating bilingual programs. 

In the development of the Handbook, it was readily recognized that a 
single document could not be equally suitable to all bilingual 
education programs . Obviously, bilingual education programs cover a 
range of languages and grade levels in a variety of settings. Some 
programs have large evaluation budgets' and access to teams of trained 
and experienced evaluators, while .others have limited budgets and 
limited human resources. Additionally, the Handbook is intended to 
serve different persons with different needs. Therefore, this 
document, Volume I, A User's Guide to Evaluation Basics , is designed 
for persons associated with the program, but not necessarily directly 
involved in conducting the evaluation. The users of the guide could 
be program directors, qs we.ll as other person's associated with the 
bilingual program, such" as teachers, parents or district 
administrators. % 

The User's Guide to Evaluation Basics provides an overview of 
evaluation issues and summarizes the procedures required to conduct an 
effective evaluation. The guide provides a summary description of the 
five components of a bilingual education program evaluation* These 
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include: planning, managing, and staffing the evaluation; 
establishing baseline data required for evaluation; monitoring program 
operations; evaluating student outcomes; and analyzing and reporting 
evaluation results. 

Volume II, entitled The Designer's Manual for Conducting an Evaluation 
is designed to be used by the persons actually conducting the 
evaluation. The manual describes how to implement each of the five 
components of the evaluation. The Designer's Manual contains 
guidelines, procedures, and worksheets to assist the program director 
* and/or program evaluatox to complete the specific tasks associated 
with the overall program evaluation. 

Volume III, entitled The Technical Appendix , contains a collection of 
reference materials covering different issues and topics related to 
evaluation practices. These are intended to assist program directors 
and program evaluators in bui ldi ng upon or expanding the evaluation 
• activities identified and discussed in Volumes I and II. 
Reproductible copies of all worksheets presented in Volume III are 
contained in this volume. 
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CONCEPTUAL FRAMEWORK FOR THE EVALUATION 



Bilingual education programs represent a unique instructional approach 
using two languages to meet their educational goals by generally 
providing instruction in academic subjects using the student's first 
(home) language j/jL) while developing the English language skills of 
the students. The students served by bilingual programs also reflect 
a wide diversity in culture, socio-economic status, and educational 
experiences. These aspects distinguish bilingual education programs 
from all other instructional approaches. 

The primary goal of bilingual education programs is the development of 
English language skills of the students as well as the development of 
their home language. Teachers recruited to teach in these programs, 
therefore, need to have language skills in the two languages being 
used for instruction. Curriculum materials in the first language are 
also needed. 

Other goals of bilingual programs often include the development of the 
student's self-concept by emphasizing the home culture and the 
improvement of his or her performance in other academic projects. In 
order to accomplish these goals, knowledge of the students' culture by 
the classroom teacher and culturally relevant curriculum materials are 
a necessity in bilingual programs. 
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Due to these factors, the evaluation of bilingual education programs . 
must be performed with considerable caution. The selection of an 
evaluation approach must take into consideration the variety of 
educational services, the curriculum materials used in the classroom, 
the number of hours of instruction provided in English and in the 
first language, the language skills of the classroom teacher, as well 
as the educational exf^rience and language skills of the students. 

Because of the complexity of this educational context, experimental or 
quasi-experimental evaluation designs are often not appropriate to 
evaluate bilingual programs. Experimental designs usually require 
random selection of students. However, random selection is not 
realistic in a bilingual education context, because it would require 
that students who are eligible to receive bilingual education 
instruction to be placed in alternative programs for control purposes. 
Similarly, the unique and differing characteristics of the students 
and the difference in the instructional services they receive make it 
very difficult to find comparable comparison groups necessary for 
quasi-experimental designs. The consensus of the literature 
addressing the evaluation of bilingual programs also indicates that 
the use of standardized tests to evaluate bilingual student progress 
is of dubious value. Despite these limitations, some formal 
measurement of student academic achievement must be undertaken in 
bilingual education programs. 
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The Recommended Evaluation Model 

The evaluation model presented in this Handbook, therefore, is solely 
designed to provide descriptive information about the operation of the 
bilingual program and on the academic performance of the students 
enrolled in the program* The information gathered through this 
process can be used to evaluate student progress and to some degree 
provide a barometer of program effectiveness. The model is based on 
the premise that an evaluation of a bilingual program should: 

o Provide descriptive information about the 
operations of the bilingual program; and 

o Provide information describing student performance 
(even if hindered from making inferences about 
program impact)* 

Therefore, the model requires the collection of student outcome data 
to determine if the students are making progress in their learning- 
It also requires the collection of information on "how" the program is 
operating. 



The model is also practical and realistic in relation to the financial 
and human resources available to conduct evaluations of bilingual 
programs. Aside from the expertise and time of the immediate 
personnel of most programs, the majority of bilingual programs have 
limited funds (generally between $2,000 to $5,000 per year) to secure 
private consultants to perform or assist with the evaluation. 
Therefore, the model takes into consideration the amount of time and 
effort that can reasonably be expected to be given to the evaluation 
effort. 
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The recommended evaluation model consists of two components. The 
first component focuses on program operations (e.g., program goals, 
time spent on instruction, etc.) using a discrepancy evaluation 
design. The design places heavy reliance on descriptive data about 
the program, therefore requiring as an initial step, the establishment 
of comprehensive baseline data on the program, the students, and the 
community • 

The actual evaluation and data collection activities -needed for the 
evaluation of program operations are performed primarily through 
search and review of program documents such as the grant proposals, 
previous evaluation reports, student files, and related material, as 
well as personnel interviews and the monitoring of classroom 
instruction. Personnel interviews to gather information on how the 
program is being operated are conducted with the program director, 
teachers, district administrators, and parents. Monitoring of 
classroom instruction is performed through observation to determine if 
the instruction is being carried out as planned and in accordance with 
the original program design. 

The discrepancy evaluation attempts to identify, and document 
differences between the initial plans of the program, and the actual 
manner in which the program is operating. Information about 
discrepancies between the planned and actual program activities, as 
identified by the discrepancy evaluation, may be used to make 
decisions on how to continue operation of the program and what changes 
might be required. 

% 




The second component of the model requires the assessment of student 
outcomes* The student outcomes to be evaluated are: 

,o English language skills; 

o First language skills; 

o Academic achievement ; and 

o Affective areas of student performance. 

Because of the difficulty in conducting program impact evaluations, 
the recommended approach to evaluate student outcomes is simply to 
evaluate student performance. This approach is referred to in this 
Handbook as the basic evaluation on the basic evaluation design. This 
basic evaluation design, therefore only answers the relative 
performance question , "to what extent are the bilingual students 
achieving? 11 

The basic design has^minimal requirements. These are: 

o Testing only the students enrolled in the 
bilingual program; 

o usi-ng adequate norm- referenced tests (NRTs) 
capable of measuring English language skills, 
first (LI) language skills, if applicable, and 
academic subjects (e.g., mat h , science , etc . ) } 
and 

o measuring performance for only one academic year. 

Applying these minimal de sign 'requirements to the first student 
outcome component , English language performance, is all that is 
reqired to meet the Federal evaluation requirements. However, most 
bilingual programs should at least evaluate performance in two other 

* 
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outcome areas, first (LI) language and academic subjects. 
Additionally, although the basic design does not require a multi-year 
evaluation design, the Handbook does recommend that bilingual programs 
attempt to collect multi-year performance data. At a minirngm, 
programs should strive to collect data over the duration of their 
grant period. It is conceivable that data showing progress over the 
life of the program,' can be used to argue that the bilingual program 
was responsible for the outcome. j 

Data resulting from the analysis of student outcomes can be used as an 
indicator of overall student performance. The data from this 
component of the evaluation, in conjunction with the, discrepancy data 
can be used to determine what program changes, if any, may be required 
to improve student performance. 

For example, the discrepancy evaluation of program operations may 
reveal a significant operational change from the original design of 
the instructional program. This change could have had considerable 
impact on the instructional program, to the extent that student 
performance may have been affected. Knowing this, the evaluator will 
be able to analyze and interpret the outcome data affected by this 
change and make recommendations for changes in the program. 

In summary, the purpose of the recommended evaluation model is to 
describe student performance and program operations. It can not be 
used as a measure of program impact. The recommended model meets all 
the requirements established in the Title VII rules and regulations. 
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The regulations reqOire that each grant have a plan to evaluate the 
progress and achievements of the bilingual program. The plan must 
include : 

^ o provisions for , measuring the accomplishments of 
* the instructional objectives of the program; 

o provisions for measuring the progress of the 
students in improving their English language- 
skills; and 

o a procedure for using the information gained from 

the evaluation to improve the operation of the 
program* 

The recommended evaluation model accomplishes this by: 

o performing an evaluation of program operations # 
using a discrepancy evaluation approach; 

o conducting an assessment of student performance in 
developing English language skills, as well as 
first language skills and performance in academic 
subjects; and 

o conducting an analysis function to determine what 
changes may be required to improve the overall 
operations of the bilingual program. 

The Handbook recommends that bilingual programs should not attempt to 
determine program impact. However, some basic guidelines for 
extending the evaluation to determine impact are presented as optional 
activities to the basic evaluation design. Extending the evaluation 
beyond the basic design, however may require more resources than those 
normally possessed by Title VII bilingual education programs. The 
Handbook also does not address entry and exit procedure issues. The 
procedures are, however, very much intertwined with evaluation of 
bilingual programs and should be considered when planning the 
evaluation. 

7 
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CHAPTER I 

'9 

PLANNING THE EVALUATION 

Planning is the single most important task in conducting an 
evaluation. Although this point seems obvious, research indicates 
that many evaluations of bilingual programs, as well as evaluations of 
other educational programs, are not planned properly. Many 
evaluations occur towards the end of the program year as a last-minute 
thought, simply to produce a report to satisfy some external 
requirement, usually imposed by the funding source. As a result, they 
are often performed haphazardly and produce poor results. 

Evaluations performed in this manner are .of little use to either the 
program itself or the funding agency. These evaluations usually fail 
to address issues that program and school administrators may have 
about the program because the evaluation design failed to incorporate 
their concerns during the planning process. Likewise, these 
evaluations are not helpful to the funding agency since, at best, they 
were planned too late in the program year to -capture useful 
information and, at worst, merely represent perfunctory efforts to 
fulfill a reporting requirement. 

The evaluation process, to achieve its own objectives, must he 
approached||in a serious manner and receive as much priority as other 
elements of the educational program. Program administrators must 
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realize that the evaluation process is a positive activity designed to 
provide information on which to base decisions for program 
improvement. 

The planning process carefully balances the reporting requirements of 
the funding agency, the information needs of decisionmakers and 
program administrators, and the scarce resources available to conduct 
the evaluation. It is unlikely that any given bilingual program will 
have the resources needed to address all the information needs of its 
different audiences. Therefore, all parties concerned must realize 
that compromises will have to be made; otherwise, resources will be 
scattered, producing little useful information. 

A properly conducted evaluation requires more than simply evaluating a 
specific component of a bilingual program (e.g., student performance). 
An effective evaluation plan identifies all the questions about the 
program that the evaluation should answer. The evaluation planning 
process, therefore, involves a series of carefully executed steps 
which identify the evaluation audience and their specific information 
needs, set priorities, determine which program components to evaluate, 
allocate scarce evaluation resources, and set timelines for the 
evaluation process. 

Next to proper planning, effective management of the evaluation 
process is a must. One person must assume the responsibility and have 
the authority to direct and manage, all facets of the evaluation. All 
persons involved in the evaluation process must be made aware of the 
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authority and be given instructions and directions on how to interact 
with that person. A clear chain of command must be delineated. In 
most Title VII programs, the program director retains and assumes that 
responsibility. For purposes of presentation, this Handbook assumes 
that the program director is the person responsible for ensuring that 
the evaluation is planned and conducted. 

GUIDELINES fOR PLANNING THE EVALUATION 

1 . Select an Evaluator and Assign Responsibilities 

Proper planning and effective management of the evaluation dictate 
that the person responsible 'for designing and conducting the more 
technical aspects of the evaluation be identified as early as possible 
in order to* become involved in the early decisionmaking of the 
evaluation planning process, in the case of most Title VII programs, 
this person is usually an independent consultant from outside the 
school system. Ideally, the evaluator should be involved in the 
original design of the bilingual program itself. In the case of Title 
VII programs, this should occur during the proposal writing stage. 
This enables the evaluator to begin working with the program director 
in planning the evaluation before the academic period to be covered by 
the evaluation commences. The plan for conducting the evaluation, if 
at all possible, should be completed prior to the first day of school 
of the academic year being evaluated. 
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A major responsibility of the program director is to survey the 
available human resources in the district and, assuming he or she has 
the authority, decide whether to use an evaluator from within the 
school system or employ an independent evaluator. The possibility of 
contracting for. the services of an evaluation specialist from a 
universitV or ^private consulting firm must be weighed against the 
potentially lower cost td the program if the evaluation can be 
conducted by district personnel. The program director must decide on 
a course of action* 

- V 

Assuming that an independent consultant or a consulting firm is 
contracted to perform or provide assistance in conducting the 
evaluation, the program director should assign clearly defined 
responsibilities and specific assignments to the evaluator, the 
program personnel assisting with the evaluation and himself. 

The evaluator's function and responsibilities are usually determined 
by the amount of technical assistance needed by the program director 
in carrying out the evaluation. The evaluator's role is therefore 
generally narrower in scope, focusing more on technical matters such 
as test selection, designing data col tion procedures and 
instruments, conducting data analyses, and reporting the evaluation 
results. 

Listed below are some guidelines to distinguish the role of the 
program director and evaluator in the conduct of the evaluation; 
These guidelines take into consideration the fact that the majority of 



the evaluation activities will actually be conducted by the program 
director and program personnel. 

The program director should: 

o Define program goals and objectives; 

o Describe the intended program; 

o Describe student characteristics; 

o Identify target audiences for the evaluation; 

o Determine the major areas to be covered by the 
evaluation ; 

o Identify possible evaluators, and in some cases, 
* select the evaluator(s) or at least recommend the 
evaluator(s) ; 

o Serve as a liaisonNfath the evaluator (or appoint 
a staff member to serve as liaison; 

o Review the evaluation design prepared by the 
evaluator to make sure it meets the evaluation 
needs; 

o Arrange interviews or write cover letters to 
questionnaires to ensure timely response and 
cooperation; 

J o Monitor classroom operations and observation 
activities; 

o Assign specific evaluation activities to program 
personnel ; 

o Identify trained personnel and/or suggest specific 
persons who should be involved in data analysis 
and interpretation; 

o Review data and react to interpretations and 
recommendations before they are included in the 
report j^and 

o Make presentations on the results of the 
\ evaluation* 
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The evaluator should: 

o Design the evaluation based on the information 
needs identified by the program director; 

o Seleot and/or review instruments to be used in the 
evaluation; 

o Monitor testing; and 

^ o Analyze the data and report findings. 

A clear delineation of responsibilities and responsible management 
will ensure that all evaluation activities are performed effectively 
and on schedule. , 

2 . Determine the Audiences and What to Evaluate 

An evaluation is designed for a particular reason and for a particular 
audience.. Thus, the first step is to determine who needs information 
from the evaluation, what type of information is needed, and for what 
purposes. In addition tq the program administrators and other 
personnel associated with the program, the typical users of evaluation 
information usually include: 

o The funding agency}' 

o District administrators; 

o School board; ,and 

o Parents and community groups. 

Each audience has different interests and needs. Therefore, the 
evaluation design must address the different needs off each audience 
and provide the information desired, while remaining within the 



budgftary* constraints of the program. The Designer's Manual (Volume 
II) provides suggestions and worksheets to use to accomplish this* 

Evaluations of ESEA Title VII funded programs, however, must pay 
particular attention to the rules and regulations pertaining to these 
programs. Embodied in these rules and regulations are a number of 
provision^ that should be viewed as minimal evaluation criteria. 
Therefore the evaluation requirements for basic and demonstration 
projects, as described in section 123a. 22 of the April 4, 1980 Federal 
Register (Vol. 45,. No. 67) must be considered in planning the 
evaluation. ' 

These regulations require that any program funded under Title VII of 
the. Elementary and Secondary Education Act (ESEA) of 1968, as amended 
must have 9 plan to evaluate s the progress and achievement of the 
bilingual program. The plan njust include: 

o ' provisions for measuring the accomplishment of the 
instructional objectives of the program; 

"o provisions for measuring the students' progress in 
improving their English language skills; and 

o a procedure for using the information gained from 
the evaluation to improve the operation of the 
program. 

3. Set Priorities and Establish Timelines 

The establishment of evaluation priorities is a must for all bilingual 
programs. Most bilingual programs allocate $3-5,000 of their yearly 
budgets for the purchase of outside consulting assistance to perform 
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the evaluation. This amount of money, together with the level of 
effort that can be devoted to thi? one task by the program director 
and the rest of the program personnel, constitute the available 
resources to conduct the evaly^tion. More than likely, the evaluation 
needs identified by the e^rcise described above will far exceed what 
can be accomplished \?f these resources. Consequently, priorities fcr 
the evaluation may have to be established. 



The program director must analyze the evaluation needs identified 
through the planning process, assess the resources available to 
conduct the' evaluation, and ask the following questions: 



o How much can I evaluate? 

o How much do I need to evaluate? 

o How much evaluation assistance can I afford? 

Additional questions, such as the ones below, will also help to 
determine priorities: 

o Is information on the program's capacity to meet 
Title VII regulations already available? If 
information is available, this information should 
be incorporated in the evaluation, 

o What are the priority areas (e.g. parent 
involvement) of the program? The evaluation 
effort should give these areas priority. 

o How are the program resources divided among 
program components? Areas. receiving a large 
proportion of program resources should be 
candidates for evaluation emphasis. 

o If there are insufficient resources to adequately 
evaluate all components, are there areas that 
should not be evaluated or should the scale of the 
evaluation be reduced in some or all areas? This 



decision should be made after considering which 
areas are already fairly well understood, which 
areas are a low program priority, and whether the 
evaluation resources are so limited that it would 
be best not to evaluate them at all rather than to 
conduct a general assessment of all areas. 

o - Which components must be evaluated each year? 

After the evaluation priorities have been determined, the program 
director should establish timelines for completing the different 
components, as well as the total evaluation of the program. The 
program director |fhould understand that certain elements of the 
evaluation must be'performed at very specific times during the 
academic year and cannot be delayed or postponed. However, the 
program director has to consider the other responsibilities of the 
persons assisting with the evaluation. Resp onsi b 1 i t ie s and 
assignments may have to be modified as a result of the established 
timelines. 

A. Determine Level of Effort, Budget and Allocate Resources 

* - 

One of the most difficult tasks in managing the overall evaluation is 
deciding how best to utilize the limited resources available, and yet 
meet all the evaluation needs. The assignment of responsibilities and 
activities to those contributing to the evaluation proceso is often 
difficult. Because most of the evaluation activities pertaining to 
Title VII programs are usually performed by the program personnel, 
coordinating time schedules to perform the evaluation with the other 
program responsibilities of the personnel can be .difficult, especially 
if human and financial resources are limited. Nevertheless, the 
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timely execution of the evaluation is essential. There are activities 
within the evaluation process that can be rescheduled; however, others 
must be performed as* planned in order to produce a reliable product. 
The effective program director must" exercise initiative and 
resourcefulness to ensure that this is accomplished. 

Determining how much of the evaluation should be conducted by program 
personnel, which activities should be performed by an independent 
contractor, and how much the total evaluation should cost is often 
difficult for many program directors. Districts with limited contract 
evaluation funds should use most of their contract funds to employ a 
trained and experienced evaluator to assist them in evaluating the 
student outcomes component of the evaluation. Other evaluation tasks, 
such as describing and monitoring program operation, can be performed 
by the program director with assistance from the program personnel as 
a normal part of program management. However, the evaluator should be 
consulted when performing these tasks. If project or district 
personnel are going to be employed to perform the evaluation, the 
program director must make specific assignments and ensure that the 
evaluation activities are performed on schedule. 

■ * * * 

A major step in planning and managing the evaluation, therefore, is 

determining the level of effort that will be required by each activity 

of the evaluation (e.g., evaluating student outcomes) and allocating 

adequate financial and human resources to the individual tasks to be 

performed. Evaluation resources, financial and human, will vary 

widely from district to district. Additionally, the level of effort 



for an' evaluation is affected by a number pf factors, such as: 
o Size of' the program; 

o What aspects of the program are evaluated; 

o The number of non-English languages represented in 
the population being served by the program^ and 

o The number of evaluation questions addressed. 

The Designer's Manual provides guidelines and worksheets to essist the 
program director to allocate resources and budget the evaluation. The 
guidelines suggest three different estimated levels of effort that can 
be applied in evaluating each program component and the different 
tasks within each component. These estimates are based on discussion 
with persons who have conducted similar evaluation activities* 

5 * Plan the Data Analysis Function 

The program director and evaluator should plan the specific data 
analysis activities that will be required by the evaluation. The type 
of analysis and techniques to be used will depend largely on the types 
of data collected. Data from the first component of the evaluation 
will consist primarily of narrative descriptions of program 
operations, as well as responses from the interviews collected. Data 
from the second component of the evaluation will be primarily in the 
form of test scores. 
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The analysis and interpreting of program operations data- is a 
straightforward comparison activity. The evaluator simply examines 
and compares the information collected on the actual operation of the 
program to the baseline information describing how the program was- 
meant to operate. For example, if the goal of the program was to 
provide instruction in all academic subjects using the native language 
of the students, the analysis function, using the second set of 
information, simply ascertains if this indeed occurred. If the gpal 
was met, the analysis activity documents this., If the instruction did 
not occur, the analysis activity also documents this and should 
attempt to ascertain what caused the change in the program design. 
Both types of findings are recorded and reported in the overall 
evaluation report. This type of comparison analysis is all that is 
needed by this component of the evaluation. 

Analyzing student outcome data is also a straightforward activity, but 
should be performed by a trained evaluator. The analysis activities 
required may be performed by simply following prescribed procedures 
within the manuals that accompany most commercial tests. Programs 
using the basic evaluation design will only be required to perform 
basic analysis, such as frequency distributions, computation of mean 
scores and standard deviations. The analysis activity will also 
require the evaluator to estimate the degree of possible error in the 
results. 

Interpreting the findings or attempting to find an association between 
the findings of the program operations component and the student 
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outcomes component should be performed very cautiously. The two sets' 
of information are not meant to be "scientifically merged" in 
accordance with sound methodological evaluation practices. However, 
an alert and perceptive evaluator may be able to develop some 
"intelligent perceptions" about the program based on the two seta of 
information. For example, knowing that history was taught u S ing the 
home language in the fourth grade, but not in the fifth, the evaluator 
may want to cloaely examine the S tudent outcome data for these two 
gradea. If the data from the fourth grade atudenta showa aignificant 
higher achievement than that of the fifth graders, th« evaluator can 
highlight this fact and then present a "professional opinion" 
suggesting that the instruction in the native language foatered this 
difference in achievement. 

The important consideration during the planning stage is to determine 
how the analysis function will be conducted. Data analysis will moat 
probably be performed by the evaluator. The time achedule for the ' 
evaluation should allow ample time to conduct the analyaes. 

6. Plan the Data Interpretation and Dev elopment of 
Recommendations 

Data interpretation in bilingual program evaluation is often not a 
strictly empirical task. To repeat the basic premise of this 
Handbook, it is probably impossible to show, by employing conventional 
social science research methods, that children in the bilingual 
program did better in the program than they would have without it. 
Therefore, interpreting the data obtained by evaluation efforts is not 
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a mechanical exercise of reciting significant ^alphas . Rather than 
concluding that the bilingual program "works" better than some 
alternate treatment, the interpretive exercise is more likely to b.e in 
the nature of a policy question. Does the bilingual program "work" 
well enough? Are decisionmakers and constituents satisfied with the 
program and the student's progress? Recognizing the policy 
implication function of data interpretation, an interpretive panel may 
be a better alternative to perform this function. Chapter IV provides 
more detailed guidelines and procedures for performing the 
interpretation function. 

Two basic approaches are therefore suggested for data interpretation 
and formulating recommendations for program modification. The first 
approach is for the evaluator to analyze, study, and interpret the 
results. Using informal means, the evaluator then checks the 
interpretations and recommendations with program staff and others as 
he/she deems appropriate. The second approach is to convene a panel 
of people with various perspectives on the program and have them 
interpret the results. The.panfel may consist of individuals that are 
representative of the various audiences. This decision can be made 
immediately before the analysis activity begins. 

7. Plan the Reporting of the Evaluation 

Preparation of the final evaluation report is an important activity of 
the evaluation. The evaluation report is the final and most visible 
product of the evaluation. Steps should be taken to ensure that the 



report addresses the purposes and specific questions of the 
decisionmakers for whom the evaluation was planned. *In addition, the 
evaluation results should be reported in a timely manner, taking care 
to ensure that the technical aspects of the evaluation effort are 
clearly presented. Together, these steps increase the usefulness of 
the evaluation results. 

Several standard elements should be included in the report* These 
include: 

o Statement of purpose; 

o Program overview and background; 

o The goals arid objectives of the bilingual 
program; 

o Description of the program and students; 

o Discussion of the me thodology . used ; including 
design, sampling strategy, instrumentation, and 
data analysis procedures; and 

o Presentation of the findings, conclusions, and 
recommendations for program change. 

The report should be concise and should include easily interpreted* 

tables, graphs, and other figures limiting the amount of narrative 

material presented. Important issues should be identified and 

highlighted in the report' if the results of the evaluation effort ere 

to be maximized* Techniques such as boxing in recommendations or 

using^a different type face are useful to highlight the most important 

points of the report. Examples of actual data collection instruments 

should be included in an appendix. Chapter V provides more detailed 

I 

guidelines for developing the report. 
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CHAPTER II 

ESTABLISHING BASELINE DATA REQUIRED FOR THE EVALUATION 

/ 

The evaluation model for evaluating Title VII bilingual education 
programs presented in this Handbook has two components. The first 
component evaluates program operations (e.g. program administration, 
staff development, parental involvement, etc.) using a discrepancy 
evaluation design. The second evaluates student outcomes. Results of 
these two. evaluation activities taken together constitute the basis 
for determining how the program operated and provides a description of 
student performance. 

In order to conduct the discrepancy evaluation of program operations, 
information on how the program was originally designed and intended to 
operate must be collected and documented. This information serves as 
the baseline data, which are compared to the data resulting from the 
actual evaluation of program operations as described in Chapter III. 

The information obtained from the evaluation of program operations is 
taken into account in developing and conducting the evaluation of 
student performance. Therefore, a very early and important step in 
conducting an evaluation of a bilingual program is the establishment 
of baseline information about the total program. 
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This description includes identifying who the program is meant to 
serve, what are the exact services of the program, how these services 
are to be provided, and what outcomes are expected from the services. 



marked achievements can reasonably be attributed to the program. 



data, to its actual operation, as determined by the evaluation of 

program operations, will indicate areas of the program that have 

either not been implemented or have changed from the time' that the 

program was originally designed. Discrepancies identified as a result 

of this comparison are a powerful management tool for the program 

director and a programma tically useful part of the whole evaluation 

* 

process. This comparison can also help to determine^ whether the goals 



of the program are reasonable, and provide information about the 
relationship between program activities and program outcomes. 

In order to accomplish this, the persons conducting the various 
evaluation activities must first develop proper documentation of the 
program context, the target students, the program goals, and the 
instructional program. This is not a difficult task. The information 
to be collected should clearly describe how the program is designed to 
meet its goals, as well as the total environment in which the program 
operates. Once this documentation is accomplished , the program 
director, with assistance from the evaluator, will be able to use the 
information to design the evaluation and to analyze and interpret the 



Without this description, it&is impossible to determine (a) whether 
#the bilingual program meets the original intent, and (b) whether any 



Comparing the original program design, as described by the baseline 
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evaluation results. The documentation does not need to be elaborate, 
simply informative. Most importantly, the information collected 
should be complete, detailed, and easy to understand. The Designer's 
Manual provides more detailed listings of the different information 
that needs to be collected. These listings are also found in the 
Technical Appendix , 

Baseline Data Needed for bhe Evaluation 
1. Describe the Context of the Program 

Develop a simple, but accurate description of the school district and 
neighborhood. Data from previous evaluation reports can be easily 
updated, thus avoiding surveys or other time-consuming efforts. The 
type of information "that should be covered in." the description 
includes: 

o Community characteristics ' Z 

Languages spoken * ^ v 

Ethnicity . * 

Social economic status (SE§.). levels j. 

Mobility and length of residence ' * . v. 

Size " , - 

o Local Education- Agertcy (LEA} description ^ 



Financial status * « . •/ 

Facilities available for the bilingual „ 
program . u ~ 

General goals ' 
Philosophy towards language and cultural 
diversify : ^ . ' * x 



o School Description 

number of bilingual students by language 
group 

number in the bilingual program 
how students are assignee to classroooms 
\ - bilinguality mix in classrooms . 

* - parent involvement in school affairs 

C ' J 

The information collected on the context of the program' should be 

compiled immediately after the data-gathering phase. While technical 

analysis, of the information is not required, the program director* Qnd 

evaluatcr should review the data in ord§r to plari the evaluation of 

program operations and mgke preliminary decisions on how the data will 

be use'd dgrin.g. analysis, t^o, determine pfogram outcomes* The 

information should be written in narrative fo^m for 'inclusion in the 

final report. The topics and subheadings provided above may serve as 

an outline for reporting this information. s ^ 

i « • % * 

V • 

2. Describe the Students 



Baseline information about the language proficiency and dominance, 
cultural background, and overall academic, achievement of the students 
enrolled in the bilingual program is essential for designing and 
conducting the evaluation. The baseline data must include informatiog 
on the skill level of the students in both/ English and their home 
•language, as well as their level of per/ormance in the subject areas 
being taught. The jjfe-sdription should also include information on the 
student's learning background and school environment. At a minimum, 
the baseline^data .should .include information on the following areas: 



o Definition of project student 

q Student selection criteria & method 

- Tests 4 cut-off scores used 

- Role of' teacher judgment 

- Role of parent wishes 

- Method of combining criteria 
o Exit criteria & follow-up 

jo Student turnover 

o Student characteristics at beginning of year 

- Language proficiency 

- Achievement level 

- Biographic data 

This information is essential for grouping students according to both 
current skills and past experience during data analysis activities and 
plays a major role in determining student performance., . 

3* Describe the Program Goals 

Developing a clear and complete description of the goals of the 
program as an esssential part "of establishing baseline data. Goal 
setting, although important, is often overlooked or ignored during the 
program planning stage. * 



9 
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Therefore, many programs operate year-to-year with little or no set 
direction. Programs that fail to establish clear and measurable goals 

4 

cannot expect to be able to measure program outcomes, t 

Programs should distinguish between short-term, intermediate goals 
relevant to a single-year evaluation and long-range goals that can be 
evaluated only over a period of several years. Failing t"o make this 
distinction creates problems for- bilingual -programs, since some 
long-tterm goals (e.g., Improved English skills) may not be applicable 
and measurable until the later grades. Long-term goals are also 
affected by the high rate of student turnover experienced by many 
bilingual programs. Since long-term goals would not apply to a 
short-term student, two sets of goals are required, .This should be 
clearly stated and presented in the baseline data being collected. 

Defining and describing student achievement goals is another important 
step in establishing baseline data. While there are many important 
considerations to recognize when specifying student achievement goals, 
the baseline data must include information on: 

o Subject areas .(e.g.,, reading, language, math); 

o Languages to be used (e.g., English, Spanish, 
etc. ) ; 

o Student language proficiency category (e.g., 
English: limited or proficient, Spanish: limited 
or proficient) ; 

o Grade level; and 

o Student affective. goals (e.g., self-concept and 
attitudes towards school ) . 
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Because the original needs of the 
may have changed, the information 
program director. Changes that 
documented. 



program, as stated in the proposal, 
collected should be reviewed by the 
have occurred should be properly 



A. Describe the Instructional Program 



Establishing baseline data for the instructional program requires more 
time and effort than any of the other three areas on which information 
is collected. Baseline data collection on the program context, 
students, and program goals basically requires the review of existing 
records, files, and the original project proposal. Baseline data 
collection for the instructional program, however, requires 
face-to-face interviews of persons associated with the program, as 
well as review of program documents. 



A description of the instructional program can be divided into three 
categories: 

o An overview of the prpgram as it was originally 
designed and initially implemented; 

o A description of the instructional approach used 
in the program, including (1) student selection, 

(2) self-concept and cultural -emphasis , (3) 
content of instruction, (4) presentation of 
content, and (5) scheduling; and 

o A description of the management of the program, 
including (1) staff organization, (2) staff roles, 

(3) staff development, (4) parent and community 
factors, (5) communication Hnks with different 
audiences and (6) dissemination of program 
information. 
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The program overview information can be collected easily from 
information contained in the grant proposal. It should include the 
grade levels and number of classrooms served by the program, the 
amount of instructional time devote^ to dual language instruction, and 
a definition of the program design (maintenance, transitional, etc,)* 

A description of the actual instructional approach used in the 
classroom and the basis for that approach require the most 
comprehensive description of any part of the bilingual program. This 
information is collected from program related documents, student 
files, classroom observations, and interviews with program 
administrations, teachers, and parents. This description is alsq the 
most important element used during the data analysis and 
interpretation. It is therefore essential that program personnel pay 
particular attention to this component. „A detailed listing of the 
types of information that need to be collected is provided in the 
Program Information Acquisition Form found in the Technical Appendix . 

* • * 

A description of the overall program organization and management is 

the last requirement of the- baseline data collection activity. This 

description will provide the basis for evaluating the operational 
* 

effectiveness of the program. A detailed listing of the information 
that needs to be collected is provided in the Designer 's Manual . 

■ ■ ' I 

\ 
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5. Document and Report the Baseline Data . " • 

Once the desired information is collected, attention. should be focused 
on the various ways it is to be used, .Th6 information: 

o Will be used as baselixie information during the 

program monitoring activities of thq, evaluation- ■ 
process; - • 

o Will provide a partial basis for planning the 
analysis and interpretation of student outcomes,, 
as described in Chapter IV; and ^ 

o Will be reported dire.ctly to various audiences as " 
part of the evaluation reports written for them. 

* • 

Immediately after the preliminary data have been collected", the data 
should be summarized in the form thai they will appear in the Final 
Evaluation \Report and submitted to the program director for review. 
An initial analysis and interpretation of the data should be conducted 
to determine which variables, if any, are to be used as a basi^ for 
separate analyses • 
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„ CHAPTER III 
CONDUCTING THE' EVALUATION OF PROGRAM OPERATIONS 



The successful completion of the planning activities and the 
establishment of the baseline data for the evaluation enable the 
program director to initiate the actual evaluation of the bilingual 
program. Ajb des,cyrij>ed before, the actual evaluation of the bilingual 
program takes. two thrusts: the evaluation of program operations and 
the eva}iu«tion of student outcomes. These "may be viewed as totally 
separate activities. However, the outcomes or outputs of both 
activities are used during the analysis function to interpret the 
overall evaluation results at\d formulate recommendations for changes 
in the program. This chapter presents guidelines and procedures for 
conducting one part of the evaluation, the evaluation of program 
operations. 

The evaluation of program operations employs the discrepancy 
evaluation design described earlier. Therefore the evaluation of 
program operation^ is performed by first developing a comprehensive 
•description of the program describing how it was designed to operate. 
This establishes the baseline data for the evaluation of program 
operations. This activity was hopefully accomplished in accordance 
with the recommended procedures in Chapter II. Most importantly, this 
activity should have been completed during the first or, at least, by 
the end of the second month of the program year. The second activity 
required to perform this facet of the evaluation is to collect another 
set of' data similar to the baseline data -on the actual operation' of 
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the program* Decisions on what data to collect, how and when to 
collect the data, and who will collect the data will have already been 
made during the planning phase of the evaluation activity (See Chapter 
I). Most of these data are collected by reviewing program related 
documents, monitoring classroom activities and interviewing various 
persons associated with the bilingual program* This set of data, 
decribing actual program operatipn (e.g., the instructional method 
being used; the amount -of instruction in English; the number of 
teacher aides assigned to a class, etc.) is compared to the baseline 
data collected at the beginning of the school year, which describes 
the program design. The comparison provides thq basis for determining 
if the program was operated as planned. If this is the case, there 
should be few ot minor 'discrepancies in -'the two sets of data which 
describe the program. If the comparison reveals significant 
discrepancies or deviations, the evaluation must document why this 
occurred. 

Discrepancies in the program operations should not necessarily be 
viewed as a negative finding. There are many reasons why a program 
may deviate from its original design. The important task is to 
determine if this deviation influenced the instructional program.. For 
example, the program may have intended to provide one hour of 
instruction in social studies using the student's native language. 
However, due to scheduling modifications, teacher shortage, or other 
factors, a change was made during the fourth month of the program and 
the instruction did not occur. The evaluation planning process, 
nevertheless, most likely identified measures for this area. That is, 
the student outcome part of the -.evaluation was intended to measure the 
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performance of the students in social studies. The resulting student 
outcomes data may show that progress was minimal. However, knowing 
that instruction in the students' native language did not occur, the 
program director and evaluator can explain the resulting student 
outcomes. The question to be addressed, then, is why the program 
design was changed. Should the. original design be reinstated? 
Answers to these and other questions begin to formulate a set of 
recommendations for the improvement of the overall program. 

/ 

While, the example above* ties the evaluation of program operations to 
the evaluation of student outcomes, it should be clearly understood 
that the primary purpose of this part of the evaluation is to examine 
and monitor the manner in which the program is being implemented. 
Additionally, the discrepancy evaluation design makes no attempt to 
infer or determine program impact. 

This chapter provides some basic guidelines for evaluating the 
instruction, staff development, and parent involvement components of 
the bilingual program. While there are other facets of the program 
operations that merit attention, these components are the most 
significant to the overall operation of the program. The level of 
effort allocated, to the evaluation of each of these components depends 
upon its emphasis and/or importance to the overall program, as 
established during the priority setting activities of the planning 
process. These issues should be addressed and resolved by the program 
director and evaluator in planning and designing the evaluation (see 
Chapter I). 

..\ 
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GUIDELINES FOR CONDUCTING THE EVALUATION OF PROGRAM OPERATIONS 



1. ' Evaluating Program Instruction 

The evaluation of the instructional program is intended to answer the 
following two questions: 

1. Are planned instructional methods actually being 
used? 

2. Are changes needed in the instructional methods? 

Data' needed to answer these questions are' obtained by observing 
classroom activities and interviewing program teachers and 
administrative staff. This* core of information is then compared to 
baseline information, obtained through activities described in Chapter 
II in order to determine if the program is operating as intended. 

The instructional program is the core of the bilingual program. The 
program director must ensure that the level of effort allocated to 
evaluate this activity is appropriate. Information, on the operations 
of the instructional programs is obtained by (a) conducting classroom 
observations, (b) interviewing the teachers whose classrooms are 
observed, and^(c) conducting supplemental interviews with a sample of 
program teachers and administrative staff. > 
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Conducting Classroom Observations — Priok to observing the classroom, 
the program director should review the program description so that 
program features which satisfy the goals and objectives can be 
observed. The features to be observed should be identified during the 
planning process. Classroom observations should become a planned 
activity of the program director* Following each informal 
observatipn, the program director should write a summary of the 
classroom instruction as it was observed. These brief summaries 
should be synthesiz^d^lnto brief reports at least three times during 
the, year — fall, winter, and spring. *fater, these brief reports 
should be used during the comparison activity and incorporated into 
the final evaluation report. Thus, over time, the program director 
develops a complete picture of how the classroom instruction is 
actually being performed. 



Topical areas that should observed by the program director will, of 
course, depend on how the particular program is designed. Some 
general categories or features to observe include: 

o Language use ; 

o * Content of the lessons; 

o Teaching methods; 

o Diagnosis and grouping of students; 

o Recordkeeping; 

o Staff roles in the classrooms (teachers and 
aides) ; 



o Active participation by students; and 
o Attitudes and general morale. 
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Conducting Teacher Interviews -- Interviews with the teachers whose 
classes were observed may answer questions of whether instructional 
methods have changed from the original planned instruction, the 
reasons for the changes, and what. changes in instructional methods may 
be needed. 

Supplemental Data Collection — In establishing the baseline data 
(Chapter II), interviews were conducted with program personnel, 
parents, and district personnel. A similar set of activities need to 
be undertaken to identify information about the actual operation of 
the program. Thus, the final step in evaluating the instructional 
program is to interview a sample of program personnel, parents, and 
local and district administrators.. Information obtained from these 
interviews becomes a direct link to the interview data used in 
establishing the baseline data. Comparing these two sets of data is 
crucial in identifying discrepancies. The program director should 
plan to re-interview a sample of program personnel as well as local 
and district administrators to elicit information about actual 
instructional operations. 

Once the interviews have been completed, the information should be 
synthesized by the program director and evaluator. This information 
is then compared to the baseline data so that discrepancies between 
planned and actual program operations can be noted. 

Analysis of Program Instruction Data — A determination of whether or 
not the instructional component of the program is operating as 
intended is made by comparing baseline information about the design 
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and plan of the instructional program (see Chapter II) to the 
information acquired from the evaluation of program instructional 
activities* This comparison leads to the identification of 
discrepancies between intended and actual program operations. Noted 
discrepancies identify areas or issues which may require decisions to 
correct the discrepancies. Later, these discrepancies may also be 
taken into account in the interpretation of student outcome data if 
the changes in the instructional program are determined to have 
influenced student 'performance. The triad of intended 
operations/instruction data, actual operations/instruction data, and 
student outcome data forms the basis for identifying final 
•recommendations for the evaluation report. 

Interpretation and Use of Results ' The results of these analyses is 
presented to those persons responsible for decisionmaking. The 
program director reviews and analyzes the data to determine if either 
immediate or future changes should be sought in the program operations 
and instructional methods employed. Frequent and immediate reports to 
the program staff should be provided by the program director. Such 
reports enable staff to review the intended changes, identify means of % 
implementing the changes, and, consequently, be a part of the program 
improvement process. 



Additional interpretation is performed by the evaluator. Using data 
from the various sources, the evaluator can examine the triad of 
intended instruction, actual instruction, and student outcomes to 
^recommend changes which should be sought and ways to implement these 
changes. 
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2. Evaluating Staff Development 

The evaluation of the staff development activities of the 
instructional program compares the actual training provided to 
teachers to that which was planned. The comparison provides 
decisionmakers with information about what training actually took 
place and how this training is related to the intended goals of thev 
program, as well as whether the> training met the needs of the program. 

Specifically, the evaluation of the staff development activities 

j 

answers the- following . questions ♦ 

« 

1. Were the staff development activities conducted as 
planned? 

2. * Did staff training activities meet the needs 

identified at the onset of the program? 

3. Did staff participants acquire the "intended 
knowledge and skills? 

4. Were staff satisfied with the training provided? 

5. Were skills acquired through training implemented 
in the classroom? 

Answers to these questions when compared ^o the baseline information 
will identify discrepancies between actual st.aff development 
activities and intended staff training, as Well as provide information 
on the actual training. A variety of data collection methods- can be 
employed to obtain the data needed to answer the above questions* 
Methods such as questionnaires, knowledge tests, and observations of 
instructional techniques can be used to provide the necessary 
information. 
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Questionnaires — Information regarding satisfaction with and outcomes 

of staff training activities can be obtained by questionnaires 

completed by the program director and staff. The Designer's Manual 

provides a sample questionnaire which can be used to collect 

information on the actual staff training activities. This 

questionnaire provides information about the, type and duration of 

training; numbers of program staff involved in the training; and 

planned and unmet expectations and objectives for the training. This 
♦ 

data should be collected within one week following the completion of 
all training activities which occur throughout the program year, or at 
the very least, near the end of the program year. 

Appropriate analytic methods for analysis of questionnaire data are 
determined by the form of the data. The evaluator or appropriate 
member(s) of the program staff should review the questionnaire 

responses and systematically categoriz^ the information according to 

» 

the evaluation questions posed. 

Knowledge Tests — A more immediate source of information on the 
impact of staff training is information derived from administering 
knowledge tests to trainees during or at the end of the training. 
These tests, devised by the instructors, should focus directly upon 
the instructional content of the training. Because of the specificity 
of such tests, no sample instruments are included in this manual. The 
results of the knowledge tests can be examined from one or more 
perspectives. The tests could be administered prior to and 
subsequent to t rai nin-g } • t hus allowing comparisons to be made between 

: t'.t 
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pre- and post-test scores. ,An alternative approach would be to use a 
control group not involved in .the training program as a basis for 
compn'^on. An additional comparison could be made between the test 
results and the stated objectives of the training program. 

Observation of Instructional .Techniques — The classroom observation 
process should yield information on the instructional approaches that 
are actually being used by teachers. To the extent that staff 
training is expected to affect instructional approaches used by 
teachers, the data acquired from, the classroom observations are also 
pertinent to determine whether or not the training accomplished its 
purposes and is being implemented as planned. For example, it may be 
possible to determine if staff development activities intended to 
provide teachers with skills that axe to be used in the classroom 
(such as how to use new materials, or administer tests) were 
successful by observing the teachers in the classroom. 

Classroom observation data should be analyzed according to procedures 
described earlier in this chapter in order to identify^discrepancies 
between intended and actual staff development activities. 
Specifically, the major goals of the staff training which pertain to 
teachers 1 instructional approaches should be compared with actual 
classroom practices as evidenced by classroom observation data. 
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Interpretation and Uae of Results The program director should 
examine the results of the analyses described above and determine if 
the goals of the staff training were met, as well as determine if 
findings related to staff training can be issued periodically 
throughout the program -year, possibly in conjunction with recommended 
changes in program instructional operations. Program personnel then 
will be able to pro-vide reactions to the recommended changes and 
identify possible approaches for implementation. 

******* • 

3. Evaluating Parent revolvement 

• ~ 1 ■ 

s 

The evaluation of the prrent involvement component should address 
questions. These questions are: ' 

1. To what* etftent did the level of parent involvement 
match the planned level? 

2. Were parents sa tis f ied- wi th their level of 
involvement? 

3. Was the program staff satisfied with the level of 
parent involvement? 

4. To what extent and in what ways has parent 
involvement changed over the life of the program? 

Data* collected and used to answer 'these questions when compared to 
information about the planned level of parental involvement identified 
in Chapter II should determine if discrepancies exist. Data needed to 
answer these questions can be gathered by interviewing parents and the 
person responsibly for administering the parent involvement component 
of the bilingual program.. 
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A. Reporting the Evaluation Results 
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The information resulting from the evaluation of program operations 
should be summarized, written, and presented in the format in which it 

will appear in the Final . Evaluation Report . The format for reporting 

1 

the resultsSfill most likely be the same used <to establish the 
baseline data. However, the report should contain a section on the 
evaluation findings and the recommendations being made to improve the 
program. 



■ C 
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CHAPTER IV 

CONDUCTING THE EVALUATION OF STUDENT OUTCOMES 



The most important goal of any educational program is to improve the 
performance of the students enrolled in the program. Therefore, 
determining student outcomes is perhaps the most important part of a 



procedures for evaluating student outcomes. The student outcomes to 
be evaluated can be divided into the following four areas: English 
(L2) language skills; non-English or first (LI) language skills; 
academic achievement (e.g., in science, social science, and 
mathematics); and affective areas of student performance. 



Conducting an evaluation of student outcomes-is neither very technical 
nor complicated if the evaluation is designed to simply describe 
student performance. A student performance evaluation is interested 



only in determining how the students in the program performed, rather 
thafi determining what caused the observed level of performance. An 



design than the former. These two different approaches to the 
evaluation of student outcomes are commonly referred to as evaluations 
of student performance and program impact or effectiveness. The terms 
program impact and program effectiveness are used interchangeably in 
this Handbook. 

These two types of evaluations are widely confused when conducting^ 



program evaluation. The 




chapter is to de scribe 



attempt to measure the latter requires a more comprehensive evaluation 
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evaluations of most educational (bilingual and other) programs. In 
particular, many evaluation reports make statements about program 
impact or effectiveness when actually they have only measured student 
performance. That is, they have observed that students have done 
better (or worse) than some standard or comparison group and then have 
taken the unwarranted step of concluding that the program was 
responsible. The Designer's Manual presents a more detailed 

discussion of the distinction between these two types of evaluation; 

r 

.* 

Ev aluating Student Performance 

Evaluations of student performance and evaluations of program impact 
are both based on the same, kinds of measures such as tests scores or 
other quantitative measures, such as attendance rates. In both types 
of evaluation, student scores are compared to some scale or standard 
to give them meaning. Evaluations of student performance usually 
group student standards of performance into two categories. Those 
are: 

o Absolute standards of performance which compare 
performance such as: 

Comprehension level (of textbooks, 
newspapers, job application forms, 
etc. ) ; 

Mastery of-specific skills such as 
language, math, or science; or s 

Proportion of days present in 
school. 

These standards of performance are measureable in absolute terms. 
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That is, they provide information on what a student can or cannot do*- 

and are not compared to any other external criteria, 

i 

o Relative standards of performance (typically 
reported as percentile ranks or standard scores) 
may comp.are student performance against: 

Norm^gr oups (National, State', and 
local*); 

Other bilingual students (National, 
w> State, and local)} * • *■ „ • 

Groups of non-bilingual students in 
the same school or district; or , 
• ' ; ' • >S 

trilingual program students in 
' previous years. 

These, of course, are only examples. There are many other comparisons, 
that can be made. However, the more comparisons made the more 
technical the evaluation becomes, often resulting in inappropriate 
comparisons and misinterpretation of results. *>• 

Measures of relative performance should- be the backbon.e of student 
outcome evaluations measuring £ nglish-language skills and academic 
subjects tested in English, Performance in other languages, generally 
must be measured in absolute terms because meaningful comparison 
groups will be difficult to find. 

Evaluating Program Impact 

Although determining the level of student performance should be the 
primary goaTtif most program evaluations, many evaluations attempt ^o 
go beyond this to demonstrate £hat the program is effective and 

A '-39 
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r espon sibl e for the observed v level of student' performance . Explicitly 
or implicitly, this question of program impact underlies most 
evalation designs. This Handbook recommends that bilingual programs 
do not attempt to cciduct an impact evaluation. The Designers Manual 
does provide information and gui delines , ? f or expanding the evaluation 
to determine program impact. 

> 

The laboratory approach to answering this question would be to divide 
the students randomly into groups — one or more groups for each type of 
program--and then to compare the effects of the different programs 
after some reasonable amount of time. In practice, however, because 
of the diversity of services and" the characteristics of bilingual 
students, this is almost never possible. The result is that the 
effect of a program cannot be separated from effects of other factors 
in a conclusive manner. An evaluation using data from a single 
academic year probably should not even try to prove impact. However, 
data collected over several years can probably be used to develop an 
argument that, while not completely definitive, will be reasonably 
convincing as to the impact of the program. Bilingual programs should 
attempt to collect multi-year data on student performance. 

Problems Associated With Accurate Measurement 

In addition to the-4ssues described above, impact evaluations, as well 
as evaluations of student performance, are themselves impacted by the 
measurement techniques available to measure performance. The 
predominant factor is the ability of the evaluation design and the 




evaluator to control the "noise 11 or more commonly, the error of 
measurement . 



It is generally accepted that test scores include some measurement 
error, and that student performance is affected by many things outside 
of the program. Therefore, the important issues for anyone involved 
in evaluation are (1) how much noise is there in a carefully done 
evaluation? and (2) can changes be expected in students (or impacts 
due to the program) that are significant enough to be measured in 
spite of the noise factor? This issue, as well as the characteristics 
of bilingual programs which impact the issue, are discussed in the 
Designer's Manual . 

Because of all the problems associated with evaluation, the Handbook 
strongly recommends that evaluations of bilingual . programs concentrate 
their efforts in conducting evaluations of student performance, rather 
than impact. This, together with the evaluation (description) of 
program operations meets the Federal evaluation requirements, as well 
as, provides the program with sufficient information with which to make 
informed decisions on how to improve the program. 
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1 . Developing the Evaluation Design 



The first steps in performing the evaluation of student outcomes is to 
determine the tyjpe of evaluation that will be conducted and what 
questions the evaluation is designed to answer. The type of 
evaluation .conducted,* however, must address the minimum Title VII 
requirements. 

Title VII requires that bilingual program evaluation include 

provisions for measuring the accomplishments of the instructional^ 

objectives^ the progress of the students in improving their English 

N 

language skills' and a procedure for using the information to -improve 
the operation of the- program. Meeting these requirements is 
relatively simple and can be accomplished by following the procedures 
recommended in the Handbook. In order to meet these requirements, the 
Handbook recommends conducting an evaluation of student performance, 
rather than attempting to determine program impact. This can be 
accomplished by using the basic evaluation design provided in this 
Handbook* 

The Basic Evaluation Design 0 

Because of the difficulty in conducting program impact evaluations, 
the recommended approach to evaluate student outcomes is simply to 
evaluate student performance. This approach is referred to in this 
Handbook as the basic evaluation on the basic evaluation design. This 
basic evaluation design, therefore only answers the relative 
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performance question , "to what extent are the bilingual students 
achieving?" 

The basic design has minimal requirements. These are; 

o Testing only the students enrolled in the 
bilingual program; 

o using adequate norm-referenced tests (NRTs) 
capable of measuring English language skills, ■ 
first (ll) language skills, if applicable , and ^ 
academic subjects (e.g., math, science, etc.); 
and' 

o measuring performance for only one academic year. 

Applying these minimal design requirements to the first student 
outcome component, English language performance, is all that is 
reqired to meet the Fede'ral evaluation requirements. However, most 
bilingual programs should at least evaluate performance in two crther 
outcome areas, first (Ll) language and academic subjects. , 
Additionally, although the basic design does not require a multi-year 
evaluation design, the Handbook docs recommend that bilingual programs 
attempt to collect multi-year performance data. At a minimum, 
programs should strive to collect data over the duration of their 
grant period. It is conceivable that data showing progress over the 
life of the program, can be used to argue that the bilingual program 
was responsible for the outcome. 



Expanding the Evaluation * 

Programs wishing to extend the evaluation beyond a description of 
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student performance to measure program effectiveness and/or impact 
will need to enhance the requirements of the basic design. At a 
minimum, these evaluation designs may require three modifications. 
They will have to obtain te.st scores for comparison purposes from 
students enrolled in other bilingual or non-bilingual programs. 



9 

ERLC 



Single-year evaluations only serve the purpose of the basic evaluation 
design and can only document if the program is effective compared to 
baseline data, but \cannot show year-to-year changes. Therefore, 
evaluations attempting to measure effectiveness will most likely 
require multi-year evaluation designs capable of tracking students 
throughout their participation in the program. Multi-year evaluations 
require the use of the same measurement instruments throughout the 
evaluation period and strict recordkeeping. 

Evaluations attempting to measure effectiveness will most likely also 
need to expand their measurement instruments beyond norm-referenced 
tests. These may include criterion-referenced tests (CRTs), mastery 
tests, and other types of measures. The Designer's Manual presents a 
detailed discussion of these issues and provides options which may be 
added to the basic design in order to attempt documenting program 
impact. 

Preparing for the Evaluation 

Because the evaluation resources are limited, the evaluation may not 
be able to answer all questions. Priorities must be determined ^ with 

• .m 6*5 



respect to the most useful information to be obtained from an 
evaluation. The evaluation does not have to provide data on each 
student f s learning outcome. The evaluation may provide data only on 
the students as a group. For example, measurements may be made of 
changes in reading achievement of third graders but not on reading 
achievement of a specific student in that grade. The evaluation does 
r.ot have to provide data on sub-skills suoh as phonetic analysis but 
rather on general skill levels such as reading achievement. 

Certain decisions must be made before any data is collected to ensure 
that the analyses can be conducted as desired. Program goals need to 
be organized according to several key student or program features such 
as: . x 

6 Subject area (e.g., reading, writing, speaking); 

o Language used in instruction (e.g., English, 
. Spanish) ; 

o Student language proficiency category (e.g., 
English: limited or proficient, Spanish: limited 
or proficient); 

o Grade level of students;* and 

o Year of the program. 

The Designer 1 s Manual provides a worksheet and instructions for 
preparing the evaluation activity. 
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2. Evaluating the English Language Component 



The English language skills to be evaluated are the fundamental 
components to language use. These include knowledge of the sound 
system for oral language and comprehension of the orthographical 
system for written language. While each of the four language skill 
areas listening, speaking, reading, and writing — can be 
considered individually, one component of language cannot easily be 
isolated from another. It simply cannot be assumed that mastery of 

one skill area necessarily indicates mastery of a related skill area; 

■> 

nor can it be assumed that lack of skill in one area indicates lack of 
skill in another. For this reason, the model recommends that 
proficiency in all four language skill areas be assessed. 

Three Basic Design Decisions 

For practical purposes, most programs must make three basic evaluation 
decisions: (a) which students to include, (b) what tests to use, and 
(c) what period of time to include. For each decision, the Handbook 
recommends a choice for a basic or minimal evaluation and then offer 
" options that will let you answer additional questions if you have the 
necessary evaluation resources. 

o Which students to include ? The basic evaluation 
requires only testing the students enrolled in the 
bilingual program . An option could be to obtain 
data from other students in the district for 
comparison purposes. Theoretically, the bilingual 
program staff could pick out comparison groups and 
.test them. In practice, though, this option is 
realistic . only where there is a district-wide 
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testing program , and the scores for all district 
students arp readily available on computers or in 
some other easy-to-use form. 



What tests to use ? The basic evaluation requires 
a reliable, standarized, norm- re fere need test 
(NRT) of reading and other language skills. 
Usually, the test used f or- dist rict-wide testing 
may be used. Options include criterion-referenced 
tests, teacher-made tests, mastery -tests, and 
tests included as part of commercial instructional 
packages. We will refer to these kinds of tests 
generically as "CRTs, etc." 

W hat period of time to cover ? The basic 
evaluation requires covering only one academic 
year and testing only once in the Spring. Two 
options are highly desirable: (a) multi-year 
designs following program students from one year 
to the next, and (b) baseline data on program-type 
students obteined before the program begins. A 
sub-issue is whether to test once of twice a year. 
The first choice should be to test only once a 
year in the Spring. Options are (a) once a year 
in the' Fall or (b) twice a year, Fall and Spring. 



These basic choices can be summarized as" follows: 



1. Students 



Basic Evaluation 
Program only 



2. Tests NRTs 

3. Term of Evaluation Single year 

(Time of Testing) Spring only 



Optional Additions 

Comparison groups 
from the district 

CRTs (etc.) 

Multi-year 
Baseline data 

Fall only 

Fall and Spring 



Applying 'the Basic Design to the English Language Component 



The basic evaluation design through the use ,of a no rm-ref erfenced 
approach provides for comparing bilingual program students to a 
national sample of students who scored at the same pretest percentile 
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on a nationally-normed test. For example, if the students in the 
bilingual program scored at the 25th percentile on the pretest, their 
growth can be compared to the growth^of the students in the norm group 
who scored at the same 25th percentile on the pretest. 

The norm-referenced approach makes the equipercentile assumption that 
a group of similar students who are hot enrolled in the bilingual 
instructional program will maintain the same percentile rank 
throughout the year. This does not mean that the group without 
bilingual instruction is not learning. It simply means that their 
learning rate keeps them at a similar position relative to other 
students in their grade. In contrast, the students in the bilingual 
program will hopefully learn faster than they would in the program. 

The question therefore being addressed is, "Do the students in the 
bilingual program increase their percentile ranking as compared to a 
national norm group who began at the same percentile?" 

Key Comparisons^ to Be Made 

There are many comparisons of performance that can be made. However, 
the five comparisons which follow are the ones that the evaluator may 
find useful and can be performed without using complex statistical 
procedures . 

1. Are the students in the bilingual program making 
gains? 

2. Is this year ! s student performance an improvement 
over past years? 

l-i«8 



< 



3. Are students meeting the objectives of the 
program? 

4. Are students doing better in the bilingual program 
than in another program? 

5. Are students doing better than the.y wojjld be 
expected to do without the program? 

The answers to the first two comparisons can be easily answered by 

applying the basic design and using a norm-rlf erenced test. ,The other 

J m / 'i 

comparisons "require adding one or more of the options described 
earlier, such as a comparison group of students from another program. 

Selecting Appropriate Tests to Measure English Language Skills 

The criteria for .selecting an achievement test to measure English 

language*skills in a bilingual program are the same as those used in 

selecting^ test for any evaluation. However, some criteria are more 

difficult to meet because few tests have been developed with the needs 

o 

and characteristic^Xof bilingual students in mind. Note also that a 
major assumption is made about the measurement of the English language 
component -- that the students learning Englisfi language skills have 
enough English language facility so that festing can occur in English. 
If this is not true, the students are\likejy being instructed in their 
native language ancfc they i ar-*e acqui ring language skills i*p that 

\ 

language. \ x , - 



\ The basic evaluation design recommends? the. usp of a standardized, 
norm-referenced test (NRT ) of reading ar>d olf\ev language skills^ to 
evaluate J.he English language component.. Most .school districts now 
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routinely administer one of these tests to all students. If the 
district does not use a norm-referenced test (NRT) and NRT scores are 
not readily available, the evaluator may choose to select one 'of the 
tests described in the Technical Appendi* . These tests are 
reasonable, reliable, and valid. The main concern should be that the 
test content matches the program curriculum, at least on a general 
level. If this basic check is not made, it may later be discovered 
that the second-grade test covers third-grade curriculum, and vice 
versa. 

There are two major problems to consider in selecting NRTs. These 
are : 

o Test level ( floor and ceiling effects ). In some 
bilingual programs, the at-grade-level test is too 
difficult for program students at pretest. The 
next lower level may be too easy at posttest time. 
If the mean score on a test is less than 25% of 
the items correct or more than 75% of the items 
correct, floor or ceiling effects probably exist, 
and the test cannot give an accurate picture of 
either student performance or program impact (See 
Out-of-Level or Functional Level Testing in the 
Technical Appendix ). ' . 

o Multi-year and mul t Igfcadefrr level requirements . 
Most bilingual programs, cover several grade 
levels. Therefore, it is desirable to have* 
-/ achievement tests that can be used to compare 
progressx across grades and that can be ijsed tt) 
follow groups of students as they progress through'- 
the grades. In practice, tbis means^ using # any one 
of the recognized achievement tests. ✓ - 
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Using'CRTs (etc.) for Evaluating the English Language Component — The 
choice of CRTs (etc) is more of 8 curriculum decision thsn an 
evaluation decision in most districts. That is, when developing 
objectives and curriculum materials for a bilingual program, many 
districts either develop or buy tests matched to their curriculum and 
the instructional materials. These tests are the best candidates to 
X use in your evaluation. If you have important objectives for student 
performance that are not covered by any other tests, you may wish to 
develop or buy special tests just for evaluating student outcomes. 
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3. Evaluating the Non-English Language Component 



Bilingual programs, for evaluation purposes, can be divided based on 
their non-English language component into three types. ^These are: 

o Spanish only programs; 

o Single languages other than Spanish programs; and 
o Multiple languene programs. 

The major differences among these three types of programs, from the 
evaluator's perspective, are: (a) only Spanish-English programs will 
find commercial tests readily available, and (b) multiple-language 
programs often include small groups that cannot be combined for 
evaluation purposes. 

Three Basic Design Decisions 

The three basic decisions made for the English language component also 
apply to the non-English language component:, (a) which students? (b) 
\ what tests? and (c) what time period? However, the decisions are even 
I simple'r for the non-English language component, because there are 
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wer alternatives available to the evaluator. The basic options can 



s 

be summarized as follows: 



Which, students ? In general, only the bilingual 
pro g(^a fn s t u d,e n t s will speak the languages in 
question and therefore the only students that can 
be included in thfe. evaluation, In a few 
districts, there 'may be comparison tjroups of 
interest from s other programs or other districts 
who use the same tests. ^oWever, in most cases, 
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only your program students will be tested in the 
non-English language, making comparison groups 
unavailable. 

o Which tests ? A limited number of standarized 
^ tests are available in Spanish (although their 
norm groups -are not analogous to those from 
English-language tests, and you should not use the 
norms as a simple standard of comparison) . For 
other languages, you # are limited to, at best, a 
feyt commercial, criterion-referenced tests, plus 
locally-made tests (CRTs, etc . ) . 

o What period of time ? Here, the evaluator has the 
option of single-year or multi-year designs since 
baseline data before the start of a new bilingual 
program could be collected. However, in practice, 
few districts will do this. In general, if the 
English? language evaluation is multi-year, the 
non-English language evaluation should also be 
multi-year. Otherwise, both should be single-year 
evaluations. 



The decision on once-a-year (Spring) verus twice-a-year (Fall, Spring) 
testing will probably also be the same foi non-English testing as for 
the. English language testing. 

The basic choices are summarized below. 



Basic Evaluation Optional Additions 



1. Students 

2. Tests 



Program only 



CRTs, etc. 

(NRTs for Spanish) 



None from 
the district 

None 



3. Term of Evaluation Single ***ar 
(Time of Testing) Spring onjy 



Multi-year 

Fall only 

Fall and Spring 



How to- Select Among the Options As you can see, the only real option 
is whether to include the non-English language component in the 
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evaluati on -at all. If you want to knoW how your students are doing in 
this area, you will^almost certainly be able to produce teacher-made 
tests that will serve your purposes, but you need to consider exactly 
which questions you can answer. with such tests. m 



Key Comparisons to bQ Made ^ 



It- 



The key comparisons that can be made relative to non-English or first 
(LI) language skill development/performance can be the same as those 
made for the English language^ component . Performance measurement 
agains norms will onJy be possible for Spanish language performance. 
Therefore, answering the first, comparison question for other languages 
will have to be made by using locally developed mastery tests. 

Answering the other questions may be done by following the same 
procedures as before. Answering the fourth question, which requipes^a 
comparison group, should not even be attempted. 

Selecting Tests for the Non-English Language Component 

4 

J* 

Selecting tests for this component is difficult because there are very 
few tests available. Spanish versions are available Tor the 
Inter-American Tests, the CTBS, and the ETS Circus test. However, 
conventional non-English language norms do not exist. The 
Inter-American Tests (Spanish) provide user-norms based on students in 
bilingual programs using that test. The norms provided with the 
Spanish CTBS do not represent the population of Spanish/English 
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bilingual students. Norms for both tests can only provide comparison 
standards for student performance evd&uation, and these comparisons 
are difficult to interpret. So far as the review of literature 
indicated, no large-scale norm groups have been tested in any other 
languages. 

/ 



J 




ERIC 



4. Evaluating Student Performance in Academic Areas 



Evaluation of performance in academic areas requires the specification 
of the skills to be assessed, selection of the language in which 
skills are to be measured, and the identification of appropriate tests 
,in English and/or the first language of the student. The evaluator 
,will need to determine which skill areas are to be included in the 
evaluation. Measurement of achievement in literacy a£ well as in 
major academic subject areas may be appropriate. This determination 
will have to be made on a program-by-program basis. If a student is 
not literate in LI or L2, then achievement testing will not be 
appropriate. If the students are literate, the language in which to 
test the students will depend upon the language in which instruction 
in the particular subject has been given, as well as the fluency of 
the student in that language. s> 

The Basic Design 

Many bilingual programs include non-language, academic subjects, such 
as math, social studies, and science. The same principles that apply 
to the English language component apply to this component if-^testing 
is done in English. A minimal evaluation would consist of (a) testing 
program students only, (b) using standardized, norm-referenced tests, 
and (c) a single-year design. Options include local comparison 
groups, longitudinal designs, and baseline data. 
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Language of Testing 

The major issue in evaluating performance in academic subject areas is 
whether to test in English or in the first (LI) language.. The » 
evaluation will be easier to"" implement ana the results easier to 
interpret if the testing is done in English. However, as a matter of f 
common sense, if the students are weak in English and much stronger in 
^their native language (e.g., new arrivals or young children from 
.non-speaking homes), then testing in the native language rqay be 
required. In such cases, the evaluation design principles for 
non-English language components apply (see above). 

Selecting NRTs 



By and large, the discission ' of tests, for English language also 
applies to tests for academic subjectjs tested in English'. The 
discussion of non-English language tests applies to tests of math, 
science, etc. in non-English languages. The basic rule here, as it 
was for English language., is to utilize the test that is used 
throughout your district. The T.nhnical Appendix contains a 
discussion on the selection of achievement tests, as well as a listing 
of these tests for testing language, mathematics, science, etc. 



Using CRTs (etc.) 



As in language testing, if you have test data available from your 
instructional program on math, science, or other subjects, you may 



78 



want to include these data in your bilingual-program evaluation.. For 
subjects tested in languages other than English or Spanish, you may 
have to depend on teacher-made **tests, and the normal cautions apply. 




i 
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5 # . Evaluating Affective Areas of Student Performance 

Affective goals, like improving student attitudes or behav>ors, are 
mentioned in connection with many bilingual programs. If your 
program has specific objectives in these areas and if the program 
includes specific components that are intended to change student 
attitudes or behaviors, then you should consider evaluating the 
effects of these components. However, you should be aware of two 
problems, which are discussed below. 

Affective goals must be clearly defined . In many bilingual programs, 
the non-acadmic goals are defined in very general terms, such as 
"improving self-concept. 11 The test chosen to evaluate changes in 
self-concept may be some readily available -commercial attitude test 
that bears very little relationship to the self-concept of the program 
students. The esults are almost certain to be meaningless. 

If you wish to evaluate affective components of your program, then you 
must define the goals clearly, describe-the components of the program 
that are intended to address the goals, and then identify appropriate 
measures, such as tests, attendance records, and so on, that match 
your goals. Then you can begin to consider an evaluation design to 
evaluate absolute student performance, relative student performance, 
and program impact in the areas that you have designed. 

Affective goals are very difficult to evaluate . While the general 
evaluation design principles apply theoretically, in practice it is 

SU. 
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very difficult ancJ frustrating too evaluate changes in attitudes, 

\ 

self-concept, and so on. This is because (a) there is a great deal of 
noise in the measurement, (b) most measures are insensitive to change 
in attitudes, (c) attitudes change greatly from month to month and 
even from hour to hour, (d) there are few good absolute criteria 
available, and (e) there are seldom any very good comparison groups 
available. 
* 

The net result is that few evaluations can provide convincing evidence 
of changes in attitudes or related characteristics of the students. 
^jFor tHis reason, we would not advise bilingual programs to invest much 
of their effort in evaluating these goals unless they are a major 
focus of the program. 

Programs wishing to measure affective areas may consult the Technical 
Appendix . This volume contains a discussion of self-concept and a 
listing of different tests available. 
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6. Conducting the Data Collection Activity 

Data collection for the first component of the Evaluation, program 
operations, consists of obtaining student background information, 
interviewing teacners, program administrators, and parents", as well as 
observing classroom operations. Data collected for evaluating student 
outcomes consist of test administration, scoring, and the recording of 
test scores. The latter activity probably requires a higher level of 
effort than the former. Hpwever, data collection for the student 
outcome component requires strict discipline and very precise 
^ procedures. 

Testing Jthe Students * 



Testing In the academic program areas —'language, math, science, and 

so on all require the same basic procedures* The main distinction 

J? 

that the evaluator should make is between formal testing for 



the Designer's Manual . 

Scoring of Test Data One of the issues in scoring tests and 
recording the scores is whether' to use computers. If the program is 
very large, the answer should probably be "yes," at least for 
norm-referenced tests. Man> programs have access to district, 



university or state computer centers that can perform the scoring of 




evaluating student outcomes and informal testing for diagnostic or 
other instructional purposes, and out-of-Tevel 'or functional level 
testing. Each type of testing and testing procedures are described in 
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the tests. If these services are not available locally, the test 
publishers or other scoring services can provide them. Hand scoring 
and recording may still have to be performed for very small programs. 
In addition, if non-standardized tests are used, it may be necessary 
to score the tests by hand before entering the scores into a computer 
for analysis. 

Recording Test Data Recording the scores is the final step in the 
data collection process. To ensure that the scores will be usable, 
the details of recording should be planned well before pretest time. 
Where a commercial scoring service is used, the evaluator may have 
little control over the recording process, but if the program elects 
to do its^p^n scoring or wishes to transfer scores from computer 
printouts to a more'^c^QV^enient form, the evaluator must consider two 
important issues.: ^(a) the accuracy o^ the data, and (b) the details 
of the data recording forms. The Designer 1 s Manual provides more 
detailed guidelines for scoring and recording the data. 



V 
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7. Analyzing Student Outcome Data 



The analysis of the student outcome data should be performed or at 
least supervised by a trained evaluator. The analysis of student 
performance data should simply answer the questions which the 
evaluation was designed to answer and make the necessary comparisons 
that were established during tne evaluation design phase. There are 



three steps in this approach: 

o Examine scores for serious mistakes or unusual 
results^ The scores can be examined simply by 
drawing the frequency distributions of test 
scores. If two sets of scores are being compared 
for the same students (for example, second-grade 
and third-grade scores) then scatter diagrams of 
one test against the other should be used. 

o Compute the mean scores and standard deviations 
for program (and comparison) students . If the 
scores do not appear to reflect any serious 
• problems or unusual program effects, then simply 

compute the mean score for each group of program 
students (and for each group, of comparison 
students, if any). The standard deviation (a 
measure of h s ow spread out the scores are) must 
also be calculated and reported for each group. 
The mean scores are used to draw comparisons or 
* look for progress of the students. 

o Estimate the possible effect of error on your 
results . What may appear to be changes in student 
performance may on J y be random changes in the 
scores due to nois« (error). Errors, in mea \ 
scores of 5 to 10 NCEs are not uncommon, 
^ especially with small groups of students. 

In examining the data from the evaluation the evaluator should check 
to see if the distribution scores resemble a normal curve (bell 
shaped). If the distribution of scores is a .different shape, this 
could indicate possible problems with the tests, testing procedures, 
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the scoring procedures or the data computer programs. An abnormal 
distribution in the data may also.be attributable to the effects 
the program on specific students. Sot example, in oqe bilingual 
program, the mean scores could show second grade students maKing a 
moderate percentile or .normal curve equivalent (NCE) gain in reading. 
However, when individual students scores are^ analyzed, it may be found 
that only a few students in that grade have made very large gains 
while the rest of the students have made little or no change in their 
percentile standings. This information is useful to the evaiuator in 
concluding that the program is working for some students but not for 
others. Using this finding, the program director may be able adjust 
the program for those students not showing improvement in reading. 

Another problem in analyzing the data from the evaluation is the kinds 
of noise (error) that remain in even the best evaluation data. 
Consideration should be taken to ensure that change in students test 
scores are not due to noise but to.o the effects of the programs. 

Error in mean scores of 5-10 NCEs are not uncommon, especially for 

< 

programs with small numbers, of students. Tests of statistical 
significance -provide the best 1 way of estimating the likelihood that 
the results are ■ simply .examples of random error. However, tests of 
statistical significance do not provide information about the 
educational importance of results,- since small gains can be 
statistically significant for large groups of students, while what 
appear to be large gains can be, due tbc^ random error with small groups 
of students. Tests of statistical significance 'also will not indicate 
flaws in your evaluation procedures. Thus, individuals responsible 
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for conducting the evaluation should look for possible problems in the 
evaluation procedures. The Designer 1 s Manual presents a thorough 
discussion of this issue. 

Analyzing the Data for Program Impact Evaluations 

Analyzing the data for program impact requires a demonstration that 
the program has had an impact on student performance, it must be shown ^ 
that student performance is better than expected, and that the program 
and nothing else is responsible. This does rrot require- any special ' 
analysis of the data. It requires the use of data from the^rogram 
operations evaluation component and student outcones to build a 
convincing argument. In addition to* the three analytic steps . 
described above, proving program impact will require three basic 
elements to build a convincing argument. These are:- 

i 

o E v i dence that students have improved their 
performance . This type "of information documents 
that similar students in the same schools had- 
lower scores in the past.- This requires compiling 
data from several different years. 

o Evidence that non-program students have not made 1 a 
simi lar improvement . This type of information 
examines the possibility that something outside of 
the bilingual program, such as a new principal or 
a new district-wide curriculum, is responsible for 
, the improvement in bilingual student performance. 
This information can only be generated by having 
locaJ qompariso v n groups — preferably from • > 

j district-wide test data. 

o Evidence that the characteristics of the bilingual . 
* students have not changed since entry into the 
p to g rajjTi Tn sdme districts, the student 
population can change drastically over a period of ' 
a year 'or two (as when large numbers of new 
arrivals enroll). Some evidence that changes in 
student population are not responsible for the 
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changes dn student k test scares must be 
t demonstrated. \ * ' 

Analyzing evaluation data, especially program impact evaluation, is 
careful, systematic detective work/ It consists of looking for clues 
and followjip of any leajls that may help to explain the effects (or 
lack of effects) 'that are observed in data, A clever and thoughtful 

evaluator can often build a convincing case by assembling a variety of 

* » <* 

evidence* Unless it is specifically required that the impact of 

program be assessed, it is better to spend the effort in developing 

the instructional program. 
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8. Interpreting the Results of the Evaluation 

r 

T 

The analysis of student, outcome data described above, provides the 
* program director and evaluator with the quantitative information on 
student performance. If a norm-referenced test was Used, the data 
will show how the bilingual students compared in achievement to a 
national norm' group. Hopefully the results* will show that bilingual 
students achieved as well or better. These results, however, do not 
provide answers as to why the students achieved. The answer to this 
question may possijbly be found by carefully examining the results 
emanating from the evaluation of program operations. 

The evaluator should understand that the two components of the 
evaluation model, the discrepancy evaluation of program operations and 
the evaluation of student performance, are not methodologically lir>ked 
together. As a matter of fact, each component may stand alone. The 
.baseline data developed for the evaluation of program operation, 
however, does play a role in' designing the evaluation vf student 
performance. That is, the baseline data provides information to^ 
determine what outcome areas should be evaluated. 

In, addition, the results of the program operations evaluation can 

/ 

prdvide the evaluator with valuable information on how the program was 
' operated, the instructional approach used, and the amount of 
instruction provided in the first language for each academic subject 
area, etc. This information can be used to "understand" the results 

of the student outcomes component of the evaluation. This information 

\ 
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is valuable to a perceptive evaluator wishing to find answers to 
explain student performance. For example, if the discrepancy 
evaluation shows thac history was taught using the first language to 
fourth grade students, but not to students in the fifth grade, the 
evaluator may want to closely examine the test scores in history for 
those two grades. Depending on what the test scores show , the 
evaluator may Jbe able to make some assumptions on what caused either 
the same or different level of performance. The evaluator may then 
want to more closely examine "how" the instruction was provided. For 
example, the evaluator may want to ascertain the level of language 
proficiency of the teacher teaching in the first language or compare 
the language^assessment scores, if available, of the students in the 
two grades.. All this information, when processed together, could 
provida clues for understanding what caused the level of performance. 

Because the two components of the evaluations are not methodologically 
linked, there are no specific procedures that can be described for 
merging the two sets of data. Nevertheless, the recommended approach 
provides the evaluator with a significant amount of information to use 
in .arriving at conclusions about the program. The analysis techniques 
required for the evaluation, as described earlier, are relatively 
simple and can usually be performed by following the instructions in 
the test manuals as well as the discrepancy procedures described in 
this Handbook. The other ingredient is the creativity of the 
evaluator and* project director in their ability to* use the information 
to better understand the program and how it might have impacted 
student performance. 

1-68 • 

» 

SO . 



CHAPTER V 
PREPARING THE EVALUATION REPORT 



Preparation of the final evaluation report is an important activity of 

< 

the evaluation. The evaluation report is the final and most visible 

K 

product of the evaluation. Steps should be taken to assure that*the 
report addresses the purposes and specific questions of the 
decisionmakers for whom the evaluation was planned. In addition, the 
evaluation results should be reported in a timely manner, taking care 
to ensure that the technical aspects of the evaluation effort are 
clearly presented. Together, these steps increase the usefulness of 

the ^evaluation results. & 

f 

Preparation of the final evaluation report can be a time-consuming and 

burdensome process if not properly planned, hfowever, reporting should 

( \ 
be a continual process occurring throughout the N ^valuation cycle. The 

preparation and sharing of evaluation information throughout the 

evaluation cycle also serves t'o strengthen communication between the 

evaluation audiences ana those conducting the ^evaluation, thereby. 

increasing the use of evaluation results. 

There are a number of basic principles which pertain, to the reporting 

process and serve to simplify preparation! of the final evaluation 

r / 

/ 

report. This discussion assumes that completion of the report is the 
primary responsibility of the program evaluator(s) contracted to 
undertake major segments of the bilingual program evaluation. 
Basically, the evaluator has three important tasks: develop an 
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understanding of the audiences who will use the information, select 
proper reporting format(s), and assist the audiences in using the 
results. Proper planning of the reporting requirements will make this 
final activity easy to complete. 

» 

The evaj.uator' mu*st understand that clear communication requires 

knowledge and understanding of the evaluation audiences. The 

,identif icat ion of the audiences should have been completed during the 

planning stages. However, it is helpful to review who the audiences 

are at the .time of reporting. The evaluator should periodically 

communicate with the audiences to identify their information needs and 

their understanding of evaluation issues, s^uch as testing. This will 

help the evaluator to tailor the report specifically to the level of 

understanding of the audiences and to detennij^e the best form in which 

to report the results. Contact with the audiences also increases the 

probability that evaluation results will in- fact be^jsed. 

m 

Evaluation reports can take different forms, but whatever the form, 

the report should be designed for a specific audience and be presented 

I 

in a manner that allows for response and interaction. Although the 
most common format is a written report, which describes the entire 
evaluation, consideration should be given to alternative versions for 
various groups. 

/Oral presentations are also a major vehicle for reporting to 
professional audiences such as teachers and program staff. Oral 
presentations are particularly important for highlighting the major 
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findings, conclusions, and recommendations', and for establishing 
two-way communication that will clarify, interpret, and influence 
decisionmaking. Such presentations can be enhanced by a panel 
^discussion and/or small group discussions of the reported results. 

Whatever reporting formats are used, the evaluator must focus on the 
audietic^s ) and their specific needs. The amount of attention given 
to the form of "reporting may make the difference between a report that 
is simply received and one that influences practice. 

Several standard elements should be included in the report. These ^ 
include: 

o Statement of purpose; 

♦ 

o Program overview and background; 

o The goals and objectives of the bilingual 
program; 

o Description of the program and students; 

o Discussion of the methodology used; including 
design, sampling strategy, instrumentation, and 
data analysis procedures; and 

o Presentation of the finding 6, conclusions, and 
recommendations for program change. 

The report should be concise and should include easily interpreted 
tables, graphs, and other figures limiting the amount of narrative 
material presented. Important issues should be identified and 
highlighted in the report if the results 'of the evaluat*ons.ef fort are 

to be maximized. Techniques such as boxing in recommendations or 

f*~ 

using a different type face are useful to highlight the most important 
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points of the report. Examples of actual data collection instruments 

s 

should be included in an appendix. 



Once the written report is completed, copies must be submitted to the 
funding agency. Plans should also be initiated to present the results 
of the evalua tjforT\o specific audiences. Consideration must be given 
to identify the appropriate person responsible for presenting the 
results. It is recommended that this be the program director and/or 
the evaluator. A decision as to which of the two will report to which 
audiences will be dictated by the individual saturation and deserves 
careful consideration . 



33 



1-72 



9 

ERIC 



VOLUME II 



DESIGNER'S MANUAL FOR CONDUCTING AN EVALUATION 



OF 



ESEA TITLE VII BILINGUAL EDUCATION PROGRAMS 

m a - - , Prepared By: 

INTERAMERICA RESEARCH ASSOCIATES, INC. 
" f555 Wilson boulevard 
Rosslyn, Virginia 22209 

MAY 1982 



94 



This document was prepared for vhe U.S. Department of Education by 
InterAmerica Research Associates, Inc. under Contract ED-3O0-8Q-Q598, 
funded through the ESEA Title VII Part C Research Agenda for Bilingual 
Education. Contractors undertaking such projects are encouraged to 
express their judgment freely in professional and technical matters. 
Therefore, the statements, findings, conclusions, and recommendations, 
herein, do not necessarily reflect the views, of the U.S. Department of 
Education. 



OCT 



ERLC 



0 



TABLE OF CONTENTS 
Volume II 



Page 



Overview lv 

Conceptual Framework for the Evaluation- 1 

CHAPTER I:. PLANNING THE EVALUATION H-l 

1. Select an Evaluator and Assign Responsibilities, - - H-3 

2. Determine the Audience and What to Evaluate H-7 

3. Set Priorities and Establish Timelines 11-12 

4. Determine Level of Effort, Budget and Allocate 

Resources 11-19 

5. Plan the Data Analysis Function 9 11-38 

6. Plan the Interpretation Function t . " 11-39 

7. Plan the Reporting of the Evaluation 11-40 

CHAPTER lis ESTABLISHING BASELINE DATA FOR THE 

EVALUATION * 

1. . Describe the Context of the Program 11-45, 

2*. Describe the Students JJ-JJ 

3. Describe the Program Goals 11-49 

4. Describe the Instructional Program 11-51 

5. Develop theProgram Description 11-56 

6. Document and Report the Baseline Data n 11-58 

CHAPTER III: CONDUCTING THE EVALUATION OF PROGRAM 

OPERATIONS n - 81 

1. Evaluate the Program Instruction Component 11-85 

2. Evaluate the Staff Development Component 11-96 

3. Evalate the Parent Involvement Component 11-100 

4. Analyze and Interpret Program Operations Data 11-105 

5. Report the Evaluation Results 11-106 

CHAPTER IV: CONDUC TING THE EVALUATION OF STUDENT 

OUTCOMES II- 109 

1. Developing the Evaluation Design 11-119 

2. Evaluating the English Language Component 11-128 

3. Evaluating the non-English Language Component 11-138 

4. Evaluating Student Performance :n Academic Areas 11-144 

5. Evaluating Affective Areas of Student Performance 11-147 

6. Conducting the Data Collection Activity 11-149 

7. Analyzing Student Outcome Data 11-159 

8. Interpreting the Results of the Evaluation 11-175 



96 

o 

ERIC 

3 



P£S£ 



CHAPTER V: PREPARING THE EVALUATION REPORT 

X. Develop an Understanding of the Audience 

2. Select a Reporting Format 

3. Assist the Audjence Using the Results 



11-182 
11-183 
11-185 



S7 



9 

ERIC 



OVERVIEW 



This document represents the second in a three-volume series 
constituting the Handbook for Evaluating ESE-A Tit le VII Bilingual 
Education Programs . The Handbook provides practical guidelines and 
recommended approaches for bilingual education program directors and 
evaluators to use in evaluating bilingual programs. 

In the development of the Handbook, it was leadily recognized that a 
single document would not be equally suitable for all bilingual 
education programs. Obviously, bilingual education programs cover a 
range of languages and grade levels in a variety £f settings. Some 
-programs have large evaluation budgets and access to teams of trained 
and experienced evaluators. while others have limited budgets and 
limited human resources. 

Therefore, this document — Volume II. The Designer's Manual for 
Conducting an Evaluation — is designed to provide program directors 
and evaluators, with specific guidelines, recommended procedures, and 
selected materials, such as worksheets and checklists, to use in the 
evaluation. The manual provides the conceptual framework for the 
recommended evaluation and data gathering model. The manual is 
div.ided into five chapters, each describing one of the five activities 
of the evaluation. These are: Planning the Evaluation; Establishing 
the Baseline Data for the Evaluation; Evaluating Program Operation; 
Evaluating Student Outcomes; and Reporting the Evaluation Results. 
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Each chapter presents a detailed explanation of the intended activity 
and provides . step-by-step procedures for using the checklists and 
worksheets in conducting the evaluation* Sample evaluation 
instruments, such as interview schedules and forms to gather other 
types of^data, are provided in reduced format. Full size copies are 
provided in the Technical Appendix * 

Volume I, entitled The User's Guide to Evaluation Basics discusses 
evaluation issues and summarizes the procedures required to conduct an 
effective evaluation. The guide provides a summary description of the 
five components of a bilingual education program evaluation. The 
guide is intended for program directors, as well as for persons 
associated with the bilingual program, but not involved in the actual 
evaluation activity. 

Volume III, entitled The Technical Appendix , contains a collection of 
references covering various evaluation issues. These are intended to 
assist program directors and program evaluators in building upon or 
expanding the evaluation activities identified and discussed in 
Volumes I and II. The volume also continued full-size, reproducible 
copies of the checklists and worksheets found in the Designer 1 s 
Manual. 
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CONCEPTUAL FRAMEWORK FOR THE EVALUATION 



Bilingual education programs represent a unique instructional approach 
using two languages to meet their educational goals by generally 
providing instruction in academic subjects using the student's first 
(home) language (LI) while developing the English language skills of 
the students. The students served by bilingual programs also reflect 
a wide diversity in culture, socio-economic status, and educational 
experiences. These aspects distinguish bilingual education programs 
from all other instructional approaches. 

The primary goal of bilingual educationX programs is the development of 
English language skills of the students as well as the development of 
their home language. Teachers recruited to teach in these programs, 
therefore, need to have language skills in the two languages being 
used for instruction. Curriculum materials in the first language are 
also needed. 

Other goals of bilingual programs often include the development of the 
student's self-concept by emphasizing the home culture and the 
improvement of his or her performance in other academic projects. In 
order to accomplish these goals, knowledge of the students' culture by 
the classroom teacher and culturally relevant curriculum materials are 
a necessity in bilingual programs. 
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Due to these factors, the evaluation of bilingual education programs 
must be performed with considerable caution. The selection of an 
evaluation approach must take into consideration the variety of 
educational services, the curriculum materials used^in the classroom, 
the'number of hours of instruction provided in English and in the 

first language, the language skills of the classroom teacher, as well 

n 

as the educational experience and language skills of the students. 

Because of the complexity of this educational context, experimental or 
quasi-experimental evaluation designs are often not appropriate to 
evaluate bilingual programs. Experimental designs usually require 
random selection of students. However, random selection is not 
realistic in a bilingual education context, because it would require 
that students who are eligible to receive bilingual education 
instruction to be placed in alternative programs for control purposes. 
Similarly, the unique and differing characteristics of the students 
and the difference in the instructional services they receive make it 
very difficult to find comparable comparison groups necessary for 
quasi-experimental designs. The consensus of the literature 
addressing the evaluation of bilingual programs also indicates that 
the use of standardized tests to evaluate bilingual student progress 
is of dubious value. Despite these limitations, some formal 
measurement of student academic achievement must be undertaken in 
bilingual education. programs. 
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The Recommended Evaluation Model 



The evaluation model presented in this Handbook, therefore-, is solely 
designed to" provide descriptive- information about the operation of the 
bilingual program and on the academic performance of the students 
enrolled in the program. The information gathered through this 
process can be used to evaluate student progress and to some degree 
provide a barometer of program effectiveness. The model is based on 
the premise that an evaluation of a bilingual program should; 

o Provide descriptive information about the 
operations of the bilingual program; and 

o Provide information describing student performance 
(even if hindered from making inferences about 
program impact) . 

Therefore, the model requires the collection of student outcome data 
to determine if the students are making progress in their learning. 
It also requires the collection of information on "how" the program is 
operating. 

The model is also practical and realistic in relation to the financial 
and human resources available to conduct evaluations of bilingual 
programs. Aside from the expertise and time of the immediate 
personnel of most programs, the majority of bilingual programs have 
limited funds (generally between $2,000 to $5,000 per year) to secure 
private consultants to perform or assist with the evaluation. 
Therefore, the model takes into consideration the amount of time and 
effort that can reasonably be expected to be given to the evaluation 
.af fort. 
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The recommended evaluation model consists of two components. The 
first component focuses on program operations (e.g., program goals, 
time spent on instruction, etc.) using a discrepancy evaluation 
design. The design places heavy reliance on descriptive data about 
the program-, therefore requiring as an initial step, the establishment 
of comprehensive baseline data on the program, the students, and the 
community. 

The actual evaluation and data collection activities needed for the 
evaluation of program operations are performed primarily through 
search and^review of program documents such as the grant proposals, 
previous evaluation reports, student files, and related material, as 
well as personnel interviews and the monitoring of classroom 
instruction.* Personnel interviews to gather information on how the 
program is being operated are conducted with the program* director, 
teachers, district administrators, and parents. Monitoring of 
classroom instruction is performed through observation to determine if 
the instruction is being carried out as planned and in accordance with 
the original program design. 

The discrepancy evaluation attempts to identify and document 
differences between the initial plans of the program and the actual 
manner in which the program is operating. Information about 
discrepancies between the planned and actual program activities, as 
identified by the discrepancy evaluation, may be used to make 
decisions on how to continue operation of the program a,nd what changes 
might be required. 
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The second component of the model requires the assessment of student 
outcomes. The student outcomes to be evaluated are: 



o English language skills; 

o First language skills ; 

o Academic achievement; and 

o Affective areas of student performance. 

Because of the difficulty in conducting program impact evaluations, 
the recommended approach to evaluate student outcomes is simply to 
evaluate student performance. This approach is referred to in this 
Handbook as the basic evaluation on the basic evaluation design. This 
basic evaluation design, therefore only answers the relative 
performance question , "to what extent are the bilingual students 
achieving?" 

The basic design has minimal requirements. These are: 

o Testing only the^ students enrolled in the 
bilingual program; 

o using adequate no rm-re f ereoc ed tests (NRTs) 
capable of "measuring English- language skills , 
first ( LI )* language skills, if applicable, and 
academic subjects (e.g. , math, science, etc.)'; 
and 

o measuring performance for only one academic year. 

Applying these minimal design requirements to the first student 
outcome component, English language performance, is all that is 
reqired to meet the Federal evaluation requirements. However, most 
bilingual programs should at least evaluate performance in two other 



outcome areas, first (LI) language and academic subjects. 
Additionally, although the basic design does not require a multi-year 
evaluation design, the Handbook does racommend that bilingual programs 
attempt to collect multi-year performance data. At a minimum, 
programs should strive- to collect data over the duration of their 
grant period. It is conceivable that data showing progress over the 
life of the program, can be used to- argue that the bilingual program 
was responsible for the outcome. 

Data resulting from the analysis of student outcomes can be used as an 
indicator of overal 1 student performance. The data from this 
component of the evaluation, in conjunction with the discrepancy data 
can be used to determine what program changes, if any, may be required 
to improve student performance. 

For example, the discrepancy evaluation of program operations may 
reveal a significant operational change from the original design of 
the instructional program. This change could have had considerable 
impact on the instructional program, to the extent that student 
performance may have been affected. Knowing this, the evaluator will 
be able to analyze and interpret the outcome data affected by this 
change and make ■ recommendations for changes in the program. 

In summary, the purpose of the recommended evaluation model is to 
describe student performance and program operations. It can not be 
used as a measure of program impact. The recommended model meets all 
the requirements established in the Title VII rules and regulations. 
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The regulations require that each grant have a plan to evaluate the 
progress and achievements of the bilingual program- The plan must 
include: 

i 

o provisions for measuring the accomplishments of 
. the instructional objectives of the program; 

o provisions for measuring the progress of the 
students in improving their English language 
skills; and 

o a procedure for using the information gained from 
the . evaluation to improve the operation of the 
program* 

The recommended evaluation model accomplishes this by: 

o performing an evaluation of program operations 
using a discrepancy evaluation approach; 

o conducting an assessment of student performance in 
developing English language skills, as well as 
first language skills and performance in academic 
subjects; and 

o conducting an "analysis function to determine what 
changes may be required to improve the .overall 
operations of the bilingual program. 

The Handbook recommends that bilingual programs should not attempt to 
determine program impact. However, some basic guidelines for 
extending the evaluation to determine impact are presented as optional 
activities to the basic evaluation design. Extending the evaluation 
beyond the basic design, however may require more resources than those 
normally possessed by Title VII bilingual education programs. Thp 
Handbook also does not address entry and exit procedure issues. The 
procedures are, however, very much intertwined with evaluation of 
bilingual programs and should be considered when planning the 
evaluation. 
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CHAPTER I 



PLANNING THE EVALUATION 



Planning is the single most important task in conducting an 
evaluation. Although this point seems obvious, research indicates 
that many evaluations of bilingual programs, as well as evaluations of 
other educational programs, are not properly planned. Many 
evaluations occur towards the end of the program year as a last-minute 
thought-, simpfly to produce a report to satisfy some external 
requirement, usually imposed by the funding source. As a result, they 
are often performed haphazardly and produce poor results. 

Evaluations performed in this manner are of little use ,to either the 
program itself or the funding age.icy. These evaluations usually, fail 
to address issues tfiat program and schoo^ administrators may have 
about the program because the evaluation design failed to incorporate 
their concerns during the planning process. Likewise, these 
evaluations will not be helpful to the funding agency since, at best, 
they were planned too late in the program year to capture useful 
information and, at worst, merely represent perfunctory efforts to 
fulfill a reporting requirement. 



The evaluation process, to achieve its own objectives, must be 
approached in a serious manner and receive as much priority as other 
elements of the educational program. Program administrators must. 
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realize that the evaluation process is a positive activity designed to 
provide information on which to base decisions for program 
improvement. 

The planning process carefully balances the reporting requirements of 
the funding agency, the information needs of decisionmakers and 
, program administrators, and the scarce resources available to conduct 
the evaluation. It is unlikely that any given bilingual^ program will 
have the resources needed to address all the information needs of its 
different audiences. Therefore all parties concerned must realize 
that compromises will have to be made; otherwise, resources will be 
scattered, producing little useful information. 

A properly conducted evaluation requires more than simply^evaluating a 
specific component of a bilingual program (e.g.^ student performance): 
An elective evaluation plan identifies all the questions about the 
program that the evaluation should answer. 

The evaluation planning process, therefore, involves a series of 
carefully executed steps which identify the evaluation audience and 
their specific information needs, set priorities, <^|termine which 
program components to evaluate, allocate scarce evaluation resources, 
and set timelines for the evaluation process. 
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Next to proper planning, effective management of 'the evaluation 
process is a must. One person must assume the responsibility and have 
the authority to direct and manage all facets of the evaluation. All 

\ 
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* persons involved in the evaluation process must be made aware of the 
authority and be given instructions and directions on how to interact 
with that person. A clear chain of command must be delineated. Li 
most Title VII programs, the program director retains and assumes that 
responsibility. For purposes of presentation, this Handbook assumes 
that the program director is the person responsible for ensuring that/ 
the evaluation is planned and conducted. ff 

1.' Select an Evaluator and Assign Responsibilities 

A 

Proper planning and effective management of the evaluation dictate 
that the person responsible for designing and conducting the more 
technical, aspects of the evaluation be identified as early as possible 
in order to become involved in the early decisionmaking of the 
evaluation planning process. In the case of mort Title VII programs, 
this person is usually an independent consultant from. outside the 
school system. Ideally, the evaluator should be involved in the 
original design of the bilingual program itaelf. In the case of Title 
VII programs, this should occur during the proposal writing stage. 
This would enable the evaluator to begin working with the program 
director in planning the evaluation before the academic period to be 
covered by the evaluation commences, 'The plan for conducting the 
evaluation, if at all possible, should be completed prior to the first 
day of schbol of the academic year being evaluated. 

/ - 

A major responsibility of the program director is to survey the 
' available human resources in the district and, assuming he or she has 
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the authority, decide whether to use an evaluator from within the 
school system or empl-oy an independent evaluator. The possibility of 
contracting for the services of an evaluation specialist from a 
university or a private consulting firm must be weighed against the 
potentially lower cost to the program if the evaluation can be 
conducted by district personnel. The program director must decide on 
a course of action. 

The program director should attempt to ensure that the person selected 
as the evaluator have a thorough understanding of the goals and 
objectives of bilingual education and be experienced in using 
measurement and evaluation techniques with limited-English-proficient 
students. Becauseit may be difficult to find a skilled evaluator who 
understands the special problems of bilingual programs, it may be more 
desirable, if affordable, to select a team of evaluators who, as a 
group, possesses all the experience and required skills. 

Assuming that an independent consultant or a consulting firm is 
contracted ' t"0 perform or p.rovide assistance in conducting the 
evaluation, the program director should assign clearly defined 
responsibilities and specific assignments to the evaluator, the 
program personnel assisting with tne evaluation, and himself. 

The program director, assuming he or she is the person in charge, must 
take the lead in delineating these responsibilities, determine the 
evaluation objectives and information needed from the evaluation 
activity, and ensure that the evaluation is successfully conducted. 
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In this respect, the evslustion of the bilingusl program is just 
another activity managed d/ the program director that occurs and is 
implemented as planned. 

The evaluator's 'function and responsibilities are .usually determined 
by the amount of technical assistance needed by the program director 
in carrying out the evaluation. The evaluator's role is therefore 
generally narrower in scope, focusing more on technical matters such 
as test selection, designing data collection procedures and 
instruments, conducting data analyses, and reporting the evaluation 
results. The evaluator may often serve as a technical consultant to 
the program director during the' planning and implementation stages. 
This role of technical advisor and consultant can be valuable to the 
program, since the evaluator can provide immediate, informal feedback 
on how the program is being implemented. Often, problems of program 
design, implementation, and management can be identified and remedied 
in the early stages of the evaluation process. The evaluator can also 
help project personnel to understand technical issues associated with 
testing, diagnosis, and program design. 

An independent evaluator may be able to point out instances in which 
the relationship between program objectives and program activities is 
tenuous or unreasonable, a relationship perhaps difficult for program 
personnel to observe easily. In this role, the evaluator can be used 
as a sounding board to determine whether there is a logical and close 
connection between what the program intends to accomplish and what the 
program is in fact -doing. This logical nexus between program goals 
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and program activities will provide the most convincing evidence that 
the program is responsible for the outcomes observed. 



Listed below are some guidelines for' distinguishing between the roles 
of the program director and evaluator. These guidelines take into 
consideration the fact that the majority of the evaluation activities 
will actually bjtconducted by the program director and program 
personnel. 

The program director should: 

o Define program goals and objectives; 

o Describe the intended program; 

o Describe student characteristics; 

o Identify target audiences for the evaluation;. 

o Determine the major areas to be covered by the 



evaluation ; 



o 



Identify possible evaluators, and in some cases, 
select the evaluator ( s ) or at least recommend the 
evaluator (s) ; 



o 



Serve as a liaison with' the evaluator (or appoint 
a staff member to serve as liaison; 



o 



Review the evaluation design prepared by the 
evaluator to make sure it meeta the evaluation 
needs; 



o 



Arrange interviews or write cover letters to 
questionnaires to ensure timely response and 
cooperation; 



o 



Monitor classroom operations and observation 
activities; 



o 



Assign specific evaluation activities to program 
personnel ; 





o Identify trained personnel <and suggest specific 
persons) who should be involved. in data analysis 
and interpretation; and 

o Review data and react to interpretations and 
recommendations before they are included in the 
report. 

The evaluator should: 

o Design the evaluation based on the information 
needs identified by the program director; 

o Select and/or review instruments to be used in the 
evaluation ; 

o Monitor testing; and 

o Analyze the data and report findings, 

A clear delineation of responsibilities and responsible management 
will ensure that all evaluation activities are performed effectively 
and on schedule. 

2. Deteraine the Audience and-What to Evaluate 

Determining which components of the program to evaluate is obviously a 
most critical decision. However, this decision is always influenced 
by the different parties involved with the bilingual program. 
Consequently, the decision of what to evaluate is largely determined 
by the evaluation needs of these parties, as welL as the financial and 
human resources available to. conduct the evaluation. 

* 

Thus, the first step in determining what to evaluate is to determine 
who needs information from the evaluation, what type of information is 
needed, and for what purposes. In addition to program administrators 
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and other personnel associated with the program, the typical users of 
evaluation information include: 

o The funding agency; 

o District administrators; 

o School board; and 

o Parents and community groups. 

Each audience has different interests and needs. Therefore, the 
evaluation design must address the different needs of each audience 
and provide the information desired, while remaining within the 
budgetary constraints of the program. 

Evaluations of ESEA Title VII funded programs, however, must pay 
particular attention to the rules and regulations pertaining to these 
programs. Embodied in these rules and regulations are a number of 
provisions that should be viewed as minimum evaluation criteria. 
Therefore the evaluation requirements for basic and demonstration 
projects, as described in section 123a. 22 of the April 4, 19B0 Federal 
Register (Vol. 45, No. 67), must be considered in planning the 
evaluation. 

These regulations require that any program funded under Title VII of 
the Elementary and Secondary Education Act (ESEA) of 1968, as amended, 
must have a plan to evaluate the progress and achievements of the 
bilingual program. The plan must include: 



o 



provisions for measuring the accomplishment of the 
instructional objectives of the program; 
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o provisions for measuring the students' progress in 
improving their English language skills? and 

o a procedure for using the information gained from 
the evaluation to improve the operation of the 
program* 

Worksheet No, .1, which follows, is a useful tool to use to identify 
the various audiences that need information from the evaluation, the 
type of information needed by each audience, the reason that the 
information is needed, and the time at which the information is 
needed. The Worksheet is also designed to help the program director 
to plan and prepare for reporting the evaluation results to all the 
audiences. After filling out all the information required on this 
Worksheet, the program director can determine the comprehensiveness 
and depth that the evaluation will require, as well as how many of the 
evaluation needs can be met with the resources available. 

If the resources available will not permit the evaluation to assess 
all the issues or program areas, the program director, evaluator, and 
all other parties concerned will have to set priorities for the 
evaluation. This will inevitably require that concessions and 
compromises be made by all parties. ^ 

How to Use Worksheet No. 1 — In the first column, indicate the group 
or groups of people who will heed, information from the evaluation and 
who will receive the evaluation report in whole or in part (funding 

agency, district administrators, school board, etc.). A brief 

i 

statement indicating the type of information needed by each audience 
should be written in the space provided under column two. Indicate 

» 
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under column three why the information is needed. This statement 
will, to a large extent, determine the type of report (column five) 
needed by this particular 'group (oral, written, executive summary, 
etc.). The statement will $lso determine which section of the report 
should be emphasized in the cover letter. For example, if^the school 
board is the intended audience and it is trying to determine program 
impact on English language development, the cov^r letter should 
emphasize the section on student outcomes. Column fbur indicates when 
the information is needed in order to provide adequate time for the 
audience to react to the report. 

This Worksheet, when used properly, provides a global picture of the 
evaluation and helps pinpoint the types of information that need to be 
collected during the course of the evaluation. It also helps, to 
determine what the evaluation report will contain* It further helps 
to. specify those points at which feedback on the evaluation report, in 
draft form, must be sought ^prior to producing the final version. 



116 

1 1-10 



\ 

I 



WORKSHEET NO. 1 



(page I of 1) 



OCTEAHtNE AUOlENCE ANO IHroHMATtOH REQUIREMENTS rOA ?HE EVALUATION 





Tyf« of Information N«a4«d 


Ration information it Naatftd 


Data 
In format Jon 


Typ« of ftaport ana 
Stcttoo to Eflpha*iia 
in Covtr lattar 










m 




















* 



uc 



11-11 



117 



3. Set Priorities and Establish Timelines 

The establishment cf evaluation priorities is a must for all bilingual 
programs* Most bilingual programs allocate $3-5,000 of their budgets 
for the purchase of outside consulting assistance to perform the 
evaluation* This amount of money, together with the level of effort 
that can be devoted to this one task by the program director and the 
rest of the program personnel, constitute the available resources to 
conduct the evaluation. More than likely, the evaluation needs 
identified by the exercise described above will far exceed what can be 
accomplished by these resources. Consequently, priorities for the 
evaluation will have to be established. 

The program director must analyze the evaluation needs identified and 
ask the following questions. Based on the information identified 
earlier, and the known resources: 

o How much can I evaluate? 

o How much do I need to evaluate? 

o How much evaluation assistance can I afford? 

Answers to these and similar questions will assist in prioritizing the 
different elements of the program to be evaluated. 

Additional questions, such as the ones below, will also help to 
determine priorities: 

i 

o Is information on the program's capacity to meet 
Title VII regulations already available? If 



11-12 



118 



•ERIC 



information is available, thi*s information can be 
easily incorporated in the evaluation. 

* o What are the priority areas (e.g. parent 
involvement) of the program? The evaluation 
effort should give these areas priority. 

o How are the program resources divided among 
program components? Areas receiving a large 
proportion of program resources should be 
candidates for evaluation emphasis. 

o If there are insufficient resources to adequately 
evaluate all components, are there areas that 
should not be evaluated or should the scale of the 
evaluation be reduced in some or all areas,? This 
decision would be made after considering which 
areas are already fairly well understood, which 
areas are a low program* priority, and whether the 
evaluation resources are so limited that it would 
be best not to evaluate them at all rather than to 
conduct a general assessment of all areas. 

o Which components must be evaluated each year? 

Answering some or all of these questions will assist the program 
director to determine what must and can be evaluated. Stiir another 
exercise to help in the priority-setting process is to break down all 
the different elements of the total program and prioritize each 
element of the program based on the information the audience needs, as 
well as the Title VII requirements. After prioritizing all the 
program elements according to this criteria, a final priority, listing 
can be developed based on the amount of evaluation activities that can 
be performed with the available financial and human resources. This 
determination is based on the estimated level of effort that each task 
of the evaluation will require. Estimating ^evel of effort is 
discussed in the next section. 
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How to Use Worksheet No, 2 — This Worksheet is designed to assist the 
person planning the evaluation to gather information in order to 
establish priorities* This form should be completed by the program 
director a^d discussed tfith the evaluator. The evaluator can also use 
the Worksheet as a general guide for developing the evaluation 
design. 

Depending on the answers to the questions above, the program director 
and evaluator will be able to* prioritize the different evaluation 
needs. To use the Worksheet, insert a "1" by components which should 
receive maximum emphasis, a lf 2 ,f by those receiving moderate emphasis, 
a "3" by the components that will receive minimum emphasis, and an ,f X M 
by the components not to be evaluated. After completing Worksheet No. 
2, the program director and the evaluator should review the 
information to ensure that priorities set by funding agencies as well 
as priorities of the program are adequately represented. 

As noted earlier, planning a useful evaluation involves a careful 

balancing of priorities and a sensible allocation of resources. It is 

unlikely that an evaluation effort can address all possible components 

and issues in any one year. Furthermore, it is more important to do a 

thorough evaluation of the most important parts of the program than to 

do a general evaluation of all program components. Therefore, it is 

important that priorities be set intentionally rather than 

arbitrarily. A program component should not be omitted from the 

« 

evaluation because of oversight 01 because the evaluation resources 
were exhausted before that component could be addressed. 
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After the evaluation priorities have been determined, the program 
director should establish timelines for completing the different 
components and the total evaluati 41 of the program. The program 
directoi needs to understand that certain elements of the evaluation 
must be performed at very specific times during the academic year and 
cannot be delayed or postponed. Additionally, the program director, 
in determining timelines for tasks assigned to specific individuals, 
has to consider other responsibilities of evaluation team members. 
Responsiblities and assignments may have to be modified as a result of 
the established timelines. 

How to Use Worksheet No. 3 — This Worksheet can be used as a baf 
chart to depict all evaluation activities. In the space provided 
under the Task heading, indicate all the major tasks and subtasks 
required to perform each of the evaluation ' activities. ^Jhis exercise 
will help the planner to "think through" all of the steps required. 
The months of the "school year" are then depicted next to the tasks. 
To use the bar chart, simply place a line through whatever period of 
time each task will require. This activity will force the planner to 
determine which activity must be performed at what time. 

. 1 ' 

A sample Worksheet, already filled out, is attached. A blank one is 
included in the Technical Appendix. 
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4. Determine Level of Effort, Budget and -Al locate Resources 

One of the most di fficult .tasks in managing the overall evaluation is 
deciding how best to utilize the limited resources available, and yet 
meet all the evaluation needs. The assignment of responsibilities and 
activities to those: contributing to the evaluation process is often 
difficult. Because most of the evaluation activities pertaining to 
Title VII programs are usually performed by the program personnel, 
coordinating time schedules to permit evaluation in addition to other 
program responsibilities can create great problems, especially if 
human and financial resources are limited. Nevertheless, the timely 
execution of the evaluation is essential. There are activities within 
the evaluation process that can be rescheduled; however, others must 
be performed as planned in order to produce a. reliable product.' The 
effective program director must exercise initiative and 
resourcefulness to ensure that this is accomplished. , 



How to Us-e Worksheet No. 4 — The effective program director uses as 
many tools as" possible. Worksheet No'. 4, the Operating Checklist for 
Bilingual Education Program Evaluation, which follows, can be used as 
a checklist to ensure that the evaluation plan contains all the 
elements needed and that they are initiated and successfully 

completed. 

i 

How to Use Worksheet No. 5 — Worksheet No. 5', the Evaluation Summary 
Guide,, summarizes all evaluation , activities by program component for 
easy monitoring of the entire evaluation. By using this checklist, 
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OPERATING CHECKLIST FOR 31 LINGUAL EDUCATION 
PROGRAM EVALUATION 



EVALUATION 3TS?S 

1. Planning, Managing, and Staffing tht Evaluation 
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1.5 Dcvtiop ovtrall management pjan of tvaluacion 

1.6 Kirt oucsidt evaluauor 

1.7 Assigning tvaluacion rtsoonsibi I icits co 
staff 

2. Planning Oaca Coll tec ion for cht Eva luac ion 
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Scheduling :ne tasting for tne evaluation 
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the program director can easily monitor activities, coordinate time 
schedules and assignments, and continue to plan and make appropriate 
modifications in the management of the evaluation. All members of the 
evaluation team should have a copy of Worksheet No. 5 , , so that the 
entire team has an understanding of the evaluation process, the role 
of each person in performing the evaluation, and the deadline for each 
evaluation activity. 

Determining how much of the evaluation shb.uld be conducted by program 
personnel, which activities should be performed by an independent 
contractor, and how much the total evaluation should cost. is often 
difficult for many program" directors . Districts with limited contract 
evaluation funds should use most of thei* contract funds tp employ a 
trained and experienced evaluator to assist them in evaluating the 
student outcomes component of the evaluation.. Other evaluation tasks, 
such as describing and monitoring program operation, can be performed 
by the program director with assistance from the program personnel as 
a normal, part of program management. However, the evaluator should be 
consulted when performing these tasks. If project or district 
personnel are going to be employed to perform the evaluation, the 
program director must make specific assignments and ensure that the 
evaluation activities are performed on schedule. 

A major step in planning and managing the evaluation,, therefore, is 
determining the level of effort that will be required by each activity 
of the evaluation (e.g., evaluating student outcomes) and allocating 
adequate financial and human resources to the individual tasks to be 
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performed. Evaluation resources, financial and human, will vary 
widely from district to district. Additionally, the level of effort 
for an evaluation is affected by a number of factors, such /as: 

b Size of the program; 

o What aspects of the program are evaluated; 

o The number of non-English languages represented in 
the population being served by the program; and 

o The scope of the evaluation. 

The Estimated Level of Effort Worksheet (Worksheet No. 6) may be used 
to estimate the amount of effort which the evaluation will require. 

The Worksheet suggests three different estimated levels of effort that 
can be applied in evaluating each program. component and the different 
tasks within each component. These estimate? are based on discussion 
with persons who have conducted these evaluation activities. The 
three levels are defined as minimum, moderate, and major. The amount 
of evaluation activity that can be performed using the minimum level 
of effort may not provide adequate data for local use, but will most 
"likely satisfy evaluation requirements of the funding agency. The 
amount of effort indicated for the moderate and major categories 
represents more realistic estimates of the effort required to perform 
an adequate evaluation of each program component. TJie major level 
category does not include all of the possible evaluation activities 
that could be included; rather, it establishes a level for a set of 
activities, which will provide adequate data for most programs. Using 

worksheet No. 6, the program director can select the desired level of 

«. «* *■ 

effort for each components 
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Using this worksheet, the costs associated with the evaluation are 
easily identifiable. The hourly cost of district and program 
personnel, including support staff, are known to the finance office. 
The number of hours that will be dedicated to the effort by each 
person multiplied by their hourly wage rate determines the direct cost 
to the program. This assumes no overhead for the district. Also, in 
some districts, trained evaluation specialists may be available at no 
cost to the program. 



The summary section of Worksheet No. 6 enables the program director to 
summarize the level of effort required to' evaluate each program 
component. It also summarizes the level of effort which willTOe 
assigned to district or program personnel and to the evaluator. After 
reviewing the summary and total level of effort required, the program 
director, can decide whether the evaluation, as planned, is affordable. 
If not, decisions will have to be made to either streamline the 
evaluation effort or to seek additional resources, 
•i 

A decision on the level of effort to be assigned to the independent 
evaluator or to the consulting firm, assuming the contract route was 
employed, will have to be made as early as possible. Once a 
contractual obligation is entered into, the district will be liable' 
for mee.ting that contract. Using the worksheet, the program director 
can determine which of the evaluation activities will be performed by 
the eValuator or when s(he) will provide assistance. Adding up the 
total number of days that the evaluator will provide and multiplying 
this total by the daily rate of the evaluator will determine the cost 
of the service. 
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The costs for independent evaluation consultants are usually standard, 
but do vary if obtained through a consulting company. The following 
figures may be used to estimate their costs. 



Independent evaluator 
(no overhead) 

Evaluator contracted through 
evaluation company 
(overhead included) 

Servior evaluator contracted 
through major educational 
research company 



$*&0-$15Q per day 



$250-$3QQ per day 



$300-$4QQ per day 



By using the worksheet, the program director will be able to determine 
what the services of the evaluator will cost and if all the work that 
needs to be performed by the evaluator is within the budget 
allocation. When summarizing costs for the evaluation of each 
component, use actual cost rates. Other cost items will include 
purchasing tests, computer time, and report preparation. 

How to Use Worksheet No. 6 — Worksheet No. 6 may be used to estimate 
the number of days needed to complete each component of the 
evaluation. The recommended levels may be used or the program 
director may wish to make his or her own estimate. ,The number of days 
assigned ,to each task of t^he evaluation to be provided by program or 
district personnel should be circled in order to clearly differentiate 
the days to be provided by the evaluator. 

This form should be completed by the program director and evaluator 
after the" evaluator"' is selected. Ideally, Worksheet No. 6 should be 

I - ' 
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completed individually by the program director and by the evaluator in 
order to compare .time allocation estimates. From the individual 
estimates, the director and evaluator should prepare a final 
allocation of level of effort to each task which should serve as the 
management tool to guide the elevation process. The worksheet 
provides a summary of the level of effort and cost estimates for all 
components of the evaluation. This worksheet can also be used to 
obtain bids from external evaluators. 
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WORKSHEET NO, 

P3TC A 



(page I of 10) 



i 



ESTIMATING ;£VEL OP EFFORT REQUIREMENTS # 
FOR 

QESCRISING THE PROGRAM ANO TH* STUOEMTS 



Estimates .ere provldtd for three Itvtis of evaluation activity 'or a given 
year: (Olfferent actfvicy Itvtis may occur e/,ch year). 

a) Minimum * colltc: information from projtct proposal, school records, 
and projtct dlrtctor. 

b) Modtrett * colltc: information frcm oroject orooosel. school rtcoros . 
oroject ci rector, jnd * semole :ont :o tnree oeooie in tacn tatecorv> 
of project star*, oil'incual teacners, district acmini strators and # 
parents jsing structured interviews or questionnaires (For estimation 
purposts below, assumt total numotr of ptoplt inttrvitwed or 
receiving a Questionnaire is tight). 

c) Major - same as that described tor ••moderate," txcapt more ptoplt in 
tach category art inttrvitwtd or stnt quastionne i rts plus classroom 
observations art conducted. (For estimation purposts btlo*, assumt 
tht total nunbtr of ptoplt interviewed or receiving cutst i onnt \ ra» is 
fifteen and that thrtt classrooms art oostrvtd) , 

Ltvt) of Effort for a Givtn Ytar 
(in Oays) 

Task Minimum Moderate Major 



I. , Preoart » discuss «itn ana 
obtain support of projtct 
dirtctor for orooosed plan 

Prtpare data collection 
instruments (using samples 
provided in Oesigner's 
Manual ) 

Identify specific ptoplt 
or rtcords from whom :o 
col Iter data and ma*a 
arrangements 
Colltc: data 

Analyst and organist data 
for use in report and analysis 
of evaluation data collected 
far later components 



Your Estimate* 



It 



5 

12 



7otal Oays 



(5*) 



(13*) 



(25i) 



Eva Tut tor 



Circle the estimate for any tasks whicn wll 1 at dont by projtct staff 
insttad of the external evaluetor. Oo not include these amounts in 
the total for the evaluator. 
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WORKSHEET NO. c 
Part 3 



(page 2 of 10) 



£ ST 1 MATING L£V£L OF EFFORT aEQUlRDlENTS 
FOR 

EVALUATING 3 «,QGRAM OPERATIONS 



Estimates are arovioto for rwotjeveis of ictivi ty to se conouccsd 3ur:ng 
a given year for aacn af :nree tamocnents - instructional netr.ocs. s:af* 
development, parent involvement {Different levels of ictivity occur' 
* tach ytar) : 

Instructional 'Hfthods 

a) Minimum - Conduct observations and interviews tw*ce/year in qniy 

two classrooms and havt evaluator do interpretation.* 

b) Major • Conduct observations and interviews tnrte times/year in 

all classrooms (for estimation purposes below, assume 
total number of classrooms equal five) and have inter- 
pretative panel •* 



Staff Training 
a) Minimum 



a) "iajor 



Seme questionnaire given to trainees following each~train* 
ing session., Knowledge test not used- and evaluator does 
interpretation. (For estimation purposes below, assume 
fifteen trainees and tnree training sessions)*^ 

Same as for minimum, plus a knowledge test' given pre and 
post training, en end*6f project summer/ questionnaire 
given and an interpretative panel is used. (For estima- 
tion purposes below, assume fifteen trainees and three 



training sess 



ions) . 



3 arenc Involvement 
a) Minimum • 

a) Major - 



Address only tne issue of trie txtent to «hicn the Itvel of 
Ptrtnt involvement matched tne planned level); evaluator 
interprets data. 

Address all tour arooosed evaluation questions given on 
sage St. '"or titration purposes a^low, assume :en 
perents and aignt staff .Damriers interviewee); nave inter- 
pretettve panel , 



The alternative methods of interpreting the data are discussed in the 
staffing chapter which /©Mows. 
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WORKSHEET NO. 6 
Part 3 



.page ; 



of 10) 



• ask, 



level of Sffort (in 3avs) 
Minimum Major 



Instructional rtetrtod 

1. Prtoare, discuss *•! ch and 
ootain suooort of project 
director for prooosea plan 

2. 3 rtoart :aca collection 
Instruments fusing samoies 
orovictd !n Designer's 
xa.iua I ) 

3. Identify who to observe 
and interview and maka 
arrangements co do so 

U % Collect oaca 

5. Analyzt data 

S. intarprac data 

7. Write report section 



Total days 



(6t> 



Staff Training 

I. *rtp*rt. discuss with and 
obtain suooort of project 
director for proposed plan 

Z. ?T%o*r% data collection 
instruments (using samples 
provided in designer's 
Aanue I ) 

3. tame arrangements for data 

col lection 

4. Collect cata - minimum (have 
cramer collect all data): 
major (nave trainer collect 
all data exceoc end of year 
questionnaire) 



I 

5 
5 
} 
Z 



(20) 



I* 



/our Estimate* 



( ) 
Eva I ua tor 

( ) 

Project Staff 



Circle the estimate for any tasks wnich will be done ay project star"? 
instead of ttie external evalujtor. Oo not include tftese amounts in tie 
total for tne evaluator. 
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WORKSHEET k JO. 6 
°art 3 



(page of 10) 

t 



T*sk 


Laval of Effort (in 
Minimum < M a Jor 


Oavs) 

Your Estimate 


5. Ana) 1 via daca 


:i 


7 




9. !nt«ror«c data and 

3tv« too racommanoat ions 


; 


3 




*. *r \ -Boort sac: » on 




It 




Tocat aays ' 


(5) 


09) 


( ) 
Evaluator 

( ) 
Projact Staff 


Parent Involvamtnt 






• 


1, Prapara, dijcuss with and 
obtain support of project 
diraecor for proposad plan 




I 


1 


2. *r«par« data CpHaction 
inscrumancs (using. sa«o las 
provided in-Dasignar 1 s 
n«nual ) 




1 




3. Maka arrangamants for data 

co l lection 




\ 




vU. Col Itct daca 


i 


6 




5. Analyza daca 


1. 


3 




6*. Interpret daca and 

develop recommendations 


j 


2 




7. writa reoort saction 


l 


2 




local days 


W 


(?5t) 


(* ) 

Eva luacor 



( ) 

Prcjac: Scarf 
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ESTIMATING LiVEL OF £r*OAT *EQUIR£.M£NTS 
EVACUATING STUOENT CUTCCK£S 



Estimates in orovided for two levels of activity eo ae conducted during i 
given year 'or eacn or" four components — cngl ish language comoomnt, nJnEnglish 
*'anguagt component . ton language acadtrmc ccffioontnt, and ronacademic studint 

sheets,. . 

English Language Component 

t) Minimun • Ust norm-referenced evaluation design only; analyze by 

grade, suoject . language used in instruction, and stuoent 
proficiency; avaluator docs interpretation. , 



b1 «\jor/ 



Use time ser i ts f> norjn-rtfirincid and comparison group 
•valuation designs; analyze by gredi , subject, Ungu??e 
used in instruction, student proficiency factors; use 
interpretative pone!.. 



Nongnglisn Language Component 
a) Minimum 



b) .Major 



Use existing test and do norm-referenced eveluation design 
only; analyze by grade, subject, language used* and student 
proficiency; avaluator. docs interpretation. , 

Oeveloo cWn test; use? time series, norm-re ferenctd and 
comparison, designs; analyze by grade, subject, language 
used "in instruction and student proficiency; use inter- 
pretative panel . 



Son language Academic Component ^ 

Minimum - Use existing test, comoere to national norms; analyze only 
by grade; 'evaluate? does interpretation. 



."ajor 



Use existing test, compare to national norms: anelyZe by 
grade and two'other key factors; use interpretative panel. 



* 



abn academic Student effects ~* fc 

' e)^ Minimum- Use only a puoiished self conceot measure: enalyze only by 

^raoe'ino^ stuoen: proficiency; avaluator does interpretation. 

b) na'Jor * Use all proposed evaluation Questions ano pata collection 
instruments: analyte by grade and student proficiency; use 
interpretative panel. 
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WORKSHEET, MO. 5; 
Part C 



(page 6 of 10) 



y 



i ask 



Ltvti of effort (in Oays) 
Hint muff Major Your Estimate* 



English language Component 

1. Prepare, discuss with and 
ootain support or oroject 
31 rector 'or sroooseo plan 

Z. Stltc: aocroonate tests 

-3. Train test aamin i strators and 
make arrangements for testing 

k. Suoarvisa casting - minimum 
(ona day tach, pre- and post* 
.tasting); major (monitor all 
tasting) 

Analyze data - minimum (pre- 
para achiavamant data for 
standard comouter analysis); 
major (prapara data for' 
Standard computer analysis, 
for savaral analysts) 

6. tntarprat results 

7. Write raport section 



8+ 
10 
10+ 



\ 



total days 



(124) 



(50+) 



( ) 

E valuator 

( - ) 

Project Staff 



NonSnglish language Component 

!. Prapara, discuss with and 
obtain support of project 
director for proposed plan 

2. Select appropriate tests 

3. Train test, aommi strators and 
tx»kc arrangements for testing 



5* 

I 



c!e estimate for any tasks whiqh wi 11 be done by project Staff instead 
of the external evaluator. Do not include tnese amounts in th« total for 
the evaluator. 
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Task 



level of Effort (in-Oays) 
Minimum *,»jor Your Estimate* 



Supervise testing * minima* 
(ont day udi, pre- and post- 
testing) ; major (monitor all 
casting 

5. Analyze :aca • mini.-ium vPf*" 
pare icnievement data 'or 
standard computer analysis); 
major - (prepare data for 
standard computer analysis 
for several analysts) 

6. Intarprat rtsulcs 

7. Write report sac; ion 



JO* 



2 . 8 
2 10 
2 - 10 



•Tocal days 



(10*) 



(45i) 



( ) 

Evaluacor 

( ) 

Project Staff 



Mon language Academic Componant 

1 . Prepare, discuss with and 
oocain support from projact 
director for proposed plan 

2.. Salter appropriate tests - 
minimum (oecome familiar with 
district tests); major (rtviaw 
commercial achievement tests 
and match to curriculum) . 

3. Train cast administrators »nd 
make arrangements for testing 

!*« Supervise testing • minumum 
(one day each, pre- and post* 
testing) 

5. Analyze data • minimum pre- 
pare acniavement data for 
standard computer analysis); 
major (prepare date for stand- 
ard computer analysis for 
several analyses) 



5 
2 

10+ 
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— ar — ' = 



rask 



level of Effort (in Days) 
Minimum «la ior v our iscimace* 



0. nceroret 'tsui:s 
*. 4r,i :s resort section 



10 
3 



Total days 



(10*) 



(«»5+) 



( ) 

EveTuJtor 

( ) 

Project Staff 



Nontcadcmic Component 

1. **epare, discuss with and 
i cain support from projtct 

director for proposed plan i r 

2J\ Sal act or devtloo aopropriata 

\ instruments 1 * 

3. Train tast administrators and 
make arrangements for tasting 
and other data coll action 1 

Analyza data * uinimum (pre- 
oart for standard comoucar 

analysis) 2 & 

5. Inttrpret results 2 10 

o.' wriet report section I 8 



Total days (8*) (3*i) I > 

1 cva luator 

f ) 

3-o.»ec: Staff* 
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SUMMARY OF EST I HATED VSVEl 3F EFFORT 
*EQU IAEM6NTS AND 
ASSOCIATED COSTS 



Summery of Oavs Evaluator Project Staff 

Program Description — — — 

.Monitoring Program 
Operations _ . 

Instructional 

Methods , 

Staff Training - - 

Partnt involvement «^-^- — — — 

Evaluating Studant 
Effects 

English Languege 
Component . — 

NonEnglish Language 

Comoonent - - 

Non I anguega 

Academic Component — - 

'v Non languege Student 

Efftcts _ • _ — . 

( ) ( ) 

"otal savs < ^valuator cost ser day ■ Total tvaluator cost ser year 

X ■ 



( 



L 
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WORKSHEET NO! 6 
Part 0 



(page 10 of 10) 



iddi:!cn*i :osc tarns 



Coses (in Ool Urs) 
Program Moni tortng Evaluating 

Sfscrtot ion 3 "coram Dot rat ions Stuotnt if ~*c:Sr 



* . s«cr«carv : im« 

2. Printing 

3. Hailing 
0th«r 

a . 
a. 

c. 

d. 

ft. 



Toe* I s 

Total Eva lua tor Coses 
Total Additional Costs 
Totai Costs -or Evaluation 
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5. Plan the Data Analysis Function 

The program director and evalustor should plan the specific data 
analysis activities that will be required by -the evaluation. The type . 
' of analysis' and techniques to be used will depend largely on the types 
of data collected. Data from the first facet of the evaluation will 
consist primarily of narrative descriptions of program operations, as 
well as responses from the interviews collected. Data from the second 
facet of the evaluation will be primarily in the form of test scores. 

The data analysis required by the recommended evaluation model is 

f 

straighforward and relatively eaay to perform. Analysis data from 
program operations data is analyzed by simply comparing two sets of 
similar data. One set describes the projgram as it was intended to 
operate, while the other, describes how the program is actually 
operating. Therefore, the only analysis required is to exaine the 
information and determine if there is a difference in the two sets of 
data. The analysis of student outcome-data is somewhat more 
technical, but can be performed by a trained evaluator. The analysis 
procedures are usually found, written the test manual supplied with the 
test. , These procedures are usually easy to perform. 

The two types of data are analyzed separately and are intertwined only 
through the efforts of a perceptive evaluator. The two types of data 
can stand alone and do not need to be integrated. However, a 
perceptive evaluator will be able to see how the two types of data can 
be used to support each other. 
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The important consideration during the planning stage is to determine 
how the analysis function will be conducted. Data analysis will most 
probably be performed by the evaluator. The time schedule for the 
evaluation should allow ample time to conduct the analyses. \ 

6. Plan the Data Interpretation Function 

'•Data interpretation in bilingual program evaluation is often not a 
st.r.ictly empirical task. To repeat the basic premise of this 
Handbook, it is probably impossible to show that children in the 
bilingual program did better in the program than they would have 
without it by employing conventional social science research methods. 
Therefore, interpreting the data obtained by evaluation efforts is not 
a mechanical exercise- of reciting significant alphas. Rather than 
concluding that the bilingual program "works" better tnan some 
alternate treatment, the interpretive exercise is more likely to be in 
the nature of a policy question. Does the bilingual program "work" 
well enough? Are decisionmakers and constituents satisfied with the 
program and the student's progress? Recognizing the policy 
implication function of data interpretation, an interpret! ve panel may 
be a better alternative to perform this function. Chapter IV provides 
a more detailed discussion on the interpretation function. 

Therefore, an important step in the evaluation is how the 
interpretation function is accomplished. An evaluation may be 
technically sound and well conducted, but may fail to be used by 
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decisionmakers because appropriate people were not involved in 
interpreting the data and in the development of recommendations. 

Two basic approaches are suggested for data, interpretation and 
formulating recommendations for program modification. The first 
approach is for the evaluator to analyze, study, and interpret the 
results. Using informal means, the evaluator then checks the 
interpretations and recommendations with program staff and others as 
he/she deems appropriate. The second approach is to convene a panel 
of people w^RTS* various perspectives on the program and have them 
interpret the results. The panel*may consist of individuals that are 
representative of the various audiences. 

7. Plan the Reporting of the Evaluation 

-Preparation of the final evaluation report is an important activity of 
the evaluation. lh% evaluation report is the final and most visible 
product of *the. evaluation. Steps should be taken to assure that the 
report, .address es ^he purposes and specific questions of the 
decisionmakers for whom the evaluation was planned. In addition, the 
evaluation results should be reported in a timely manner, taking care 
to ensure that the technical aspects of the evaluation effort are 
clearly presented. TogetheT, these steps increase the usefulness of 
the evaluation results. 

Preparation of the final evaluation report can be a time-consuming and 
♦ burdensome ^process if not properly planned. However, reporting should 
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be a continual process or activity that occurs throughout the 
evaluation cycle. For example, Chapter III will recommend that 
following each classroom observation, a brief report should be 
prepared. These brief reports should be summarized at least three 
times during the program year— fall, winter, and spring ~ and should 
be shared with program personnel so tha; they can become part of the 
program improvement process. Thus, these brief reports and summaries 
prepared throughout the evaluation cycle will all feed into the final 
evaluation report, thus simplifying the reporting process. The 
preparation and sharing of evaluation information throughout the 
evaluation cycle also serves to strengthen communication between the 
evaluation audiences and those conducting the evaluation, thereby 
increasing the use of evaluation results. 

There are a number of basic principles which pertain to the reporting 
process and serve to simplify preparation of the final evaluation 
report. This discussion assumes that completion of the report is the 
primary responsibility of the program evaluator(s) contracted to 
undertake major segments of the bilingual program evaluation. 
Basically, the evaluator has three important tasks: develop an 
understanding of the audiences who will use the information, select a 
proper reporting format(s), and assist the audiences in using the 
results. Proper^planning of the reporting requirements will make this 
final activity easy to complete. 

Several standard elements should be included in the report. These 
include: 
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o Statement of purpose; 

o Program overview and background; 

o The goals and objectiv.es of the bilingual 
program; 

o Description of the program and students; 

o Discussion of the methodology used; including 
design, sampling strategy , instrumentation , and 
data analysis procedures; and 

o Presentation of the findings, conclusions, and 
recommendations for program change* 

The report should be concise dnd should include easily interpreted 
tables, graphs, and other figures limiting the amount of narrative 
material presented. Important issues should be identified and 
highlighted in the report if the results of the evaluation effort are 
to be maximized. Techniques such as boxing in recommendations or 
using a different type face are useful to highlight the most important 
points of the report. Examples of actual data collection instruments 
should be included in an appendix. Chapter V provides more detailed 
guidelines for developing the report. 



CHAPTER II 



ESTABLISHING BASELINE DATA REQUIRED FOR THE EVALUATION 



The evaluation model for evaluating Title VII bilingual education 
programs presented in this Handbook has two components. The first 
component evaluates program operations (e.g. program administration, 

staff development, parental involvement, etc.) using a discrepancy 

j 

evaluation design. • The second evaluates student outcomes. Results of 
these two evaluation activities taken together constitute the basis 
for determining how the program operated and provides a description of 
student performance. 



In order to conduct the discrepancy evaluation of program operations, 
information on how the program was originally designed and intended to 
operate must be collected and documented. This information serves as 
the baseline data, which are compared to the data resulting from the 
actual evaluation of program operations as described in Chapter III. 

The baseline data are also taken "into account in developing the 
student outcomes evaluation design for the student outcomes component 
of the evaluation. Therefore, a very early and important step in 
conducting an evaluation of a bilingual program is the establishment 
of baseline information about the total program. 
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This description identifies who the program is meant to serve, the 
% exact services of the -program, how these services are to be provided, 
and what outcomes are expected from the services* This description 
enables the evaluator to determine (a) whether the bilingual program 
meets the original intent, and (b) whether any marked achievements can 
reasonably be attributed to the program. 

Comparing of the original program design, as described by the baseline 
data, to its actual operation, as determined by ttfe evaluation of 
program operations, will indicate areas of the program that have 
either not been implemented or have changed from the time that the 
program was originally designed. Discrepancies identified as a result 
of this comparison, are a powerful management tool for the program 
director and a programmatically useful part of the whole evaluation 
process. This comparison can also help to determine whether the goals 
of the program are reasonable, and provide information about the 
relationship between program activities and program outcomes* 

In order to accomplish this, the persons conducting the various 
evaluation activities must first develop proper documentation of the 
program context, the target students, the program goals, and the 
instructional program. This is not a difficult task. The inf prmatipn - 
to be collected should clearly describe how the program is designed to 
meet its goals ; , as well as the. total environment in which the program 
operates. Once this documentation is accomplished, the program 
director, with assistance from the evaluator, will be able to use the 
information to design the evaluation -and to analyze and interpret t.he 



evaluation results. The documentation does not need to be elaborate, 
.samply informative. Most importantly, the information collected 
should be complete, detailed, and easy to understand. r 

Baseline Data Needed, for- the Evaluation 



1, Describe the Context of the Program 

Develop a simple, but accurate description of the school district and 
neighborhood. Data from previous evaluation reports can be easily^ 
updated, thus avoiding surveys or other time-consuming efforts. Tt)e . 
type of information that should be covered in the description 
includes: » 

' o Community characteristics - ^ 

Languages spoken 
Ethnicity 

Social economic status (SES) levels s 
Mobility and length of residence r 
Size 

o Local Education Agency (LEA) description 
Size 

Financial status 

Facilities available for the bilingual 

program 

General goals 

Philosophy towards language and cultural 
diversity 
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o School Description | 

\i - number of bilingual students by language 

group" 

number in the bilingual program 
how students are assigned to classroooms 
- * bilinguality mix in classrooms 

parent involvement in school affairs 

^ t The information collected on the program context should be compiled 
immediately afte-r the data-gathering phase. While technical analysis 
of the information is not required, the program director and evaluator 

should review the data in order to plan the program monitoring portion 

^ ******* 

of the evaluation and make preliminary decisions on how the data will 
be used during analysis to determine program outcomes. The 
information should be written in narrative form for inclusion in the 
/< fxnal report. The topics and subheadings provided above may serve as 

an outline for reporting this information. 



2. Describe the Students 

Baseline information about the language proficiency and dominance, 

cultural background, and overall academic achievement of the students 

enrolled in the bilingual program is essential for designing and 

conducting the evaluation. The data must include information on the 

skill level* of the students in both English and their home language, 

as well as their level of performance in the subject areas being 

taught. The description should also include information on the 

student's learning background and school environment. At a minimum, 

the baseline data must include information on the following areas: > 

/ 
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o Definition of project student j 
o Student selection criteria 4 method 

- Tests 4 cut-off scores used 

- Role of teacher judgment 

- Role of parent wishes 

- Method of combining criteria 
o Exit criteria & follow-up 

o Student turnover 

o Student characteristics at beginning of year 

- Language proficiency 

- Achievement level 
Biographic data 

This information is essential for grouping students according to both 
current skills and past experience during data analysis activities and 
plays a major role in determining student performance. For example, 
a student with a low English reading pretest score might be expected 
to show greater improvement if he or she were a new arrival from a 
high SES background, and with no previous .training in English reading, 
than if he or she were from a low SES background and had been in a 
bilingual program for several years. 

< 

A 'more accurate understanding of Jthe evaluation results can be 
obtained if the baseline data present a clear picture of the 
environment and learning history of the students in the program. 
Unfortunately, few programs collect this information during the 
evaluation, and even fewer. present a systematic treatment of this 
information in evaluation reports. 
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Because most bilingual programs span several grade levels and are 
funded for a minimum of three years, bilingual programs should develop 
multi-year student profiles These multi-year profiles can increas 
the value of the student descriptions. Since most schools keep 
permanent student record files, the evaluator can easily make minor 
additions to the records each year to ensure that the appropriate 
background and information on services is readily available for each 
student in the program* 

Many programs enroll substantial numbers of monolingual, 
na t i v e - E n g 1 i sh speakers, as well as students classified as 
limited-English-proficient (LEP), but who may be proficient in 
English. It is necessary to maintain the same amount of information 
on English language experience for these students as is required for 
non-proficient students. Knowledge of the different language levels 
of st ud^nTsALn a class can be used to describe the linguistic 
environment of that class. Information on these students can be 
analyzed separately from that collected for students who are learning 
English as a second language to determine the effects of bilingual 
instruction on these students. 

Information on the students should be compiled in narrative^ form for 
inclusion in the final report. This i nf o rma ti on di f f ers from that 
collected in the previous section in two important ways: first, this 
information could change markedly from one year to the next (the 
inf ormation on the communi t y * may change but it 'is likely to be 
gradual), and second, information on the students can be modified by 
the program. ^ 

I5i 
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3. Describe -the Program Goals 

Developing a clear and complete description of the goals of the 
program is an esssential part of establishing baseline data. Goal 
setting, although important, is often overlooked or ignored during the 
program planning stage. Therefore, many programs operate year-to-year 
with little or no set direction. Programs that fail to establish 
clear and measurable goals cannot expect to !>e able to measure program 
outcomes . 

* * 
* 

Most program goals are established to meet local, State, and possibly 
Federal guidelines in addition to other guidelines developed by 
parents and program personnel. Simply complying with these guidelines 
often determines the major goals and how. they will be met. These 
goals, as well as those intended to meet local needs, should be 
included in the description. Also included should be a timetable for 
accomplishing the goals. 

Programs should distinguish between short-term, intermediate goals 
relevant to a single-year evaluation and long-range goals that can be 
evaluated only over a period of several years. Failing to make this 
distinction creates problems for bilingual progrr 3, since some 
long-term goals (e.g., improved English skills) may not be applicable 
and measurable until the later grades. Long-term goals are also 
affected by the high rate of student turnover experienced by many 
bilingual programs. .-Since long-term goals would not apply to a 
short-term student, two sets of goals are required- This should be 
clearly stated and presented in the baseline* data being collected. 

J.DO 



Defining and describing student achievement goals is another important 
step in establishing baseline data. While there are many important 
considerations to recognize when specifying student achievement goals, 
the baseline data must include information on: 

o Subject areas (e.g., reading, language, math); 

o Languages to be used (e.g., English, Spanish, 
etc.); 

o Student language proficiency category Cfe.g-t 
English: limited or proficient, Spanish: limited 
or proficient); 

o Grade level; and 

o Student affective goals (e.g., self-concept and 
•attitudes towards school). \ 

Students who are exited from a bilingual program to a conventional 
classroom often require special follow-up services. Districts that 
provide such services should cleanly specify and carefully describe 
how they are integrated into the goals of the program, along with 

other educational goals. . * 

♦ 

Because the original needs of the program, as stated in the proposal, 
may have changed, the information collected should be reviewed by the 
program director. Changes that have occurred should be properly 
documented. 

A detailed description of the goals for each component of the project 
being evaluated — e.g., program operations, parent involvement, staff 
development; and student effects should be developed. The 

baseline data collected should be used to finalize the evaluation 
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design and to ensure that each goal is appropriately measured by the 
evaluation activity. The information will also be used to interpret 
the evaluation results and make recommendations. The Final Evaluation 
Report should indicate if progress towards meeting the goals was 
measured, if the goals were met, and if not, what changes are 
necessary to ensure that the goals will be met, or what changes should 
be made in the goals. It is important to remember that not al) goals 
need to be met in the current reporting period. 

4, Describe the Instructional Program 

Establishing baseline data for the instructional program requires more 
time and effort than any of the* other three areas on which information 
is collected- Baseline data collection on the program context, 
^students, and program goals basically requires the review of existing 
records, files, and the original project proposal. Baseline data 
collection for the instructional program, however, requires 
face-to-face interviews of persons associated with the program, as 
well as review of program documents. 

A description of the instructional program can be divided into three 
categories: 



o 



An overview of the program as it was originally 
designed and initially implemented; 

A description of the instructional approach used 
in the program, including (1) student selection, 
(2) self-concept and cultural emphasis, (3) 
content of instruction, (4) presentation of 
content, and (5) scheduling; and 
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o Management of the program, including (1)^ staff 
organization, (2) staff roles, (3) 'staff 
development, (4) parent and commynity factors,. (5) 
communication links with different audiences, pnd 
(6) dissemination of program information. 

Thus, the description of the instructional program is the most 
' exhaustive of the activities associated with the establishment of 
baseline data. 

Information for the program overview can be collected easily from 
information contained in the grant proposal. It should include the 
grade levels and number of classrooms served by the program, the 
* amount of instructional time devoted to dual language instruction, and 
a definition of the program design (maintenance, transitional, etc.). 

A description of the actual instructional approach used in the 
classroom and the basis for that approach require the most 
comprehensive description of any part of the bilingual program. This 
information can be collected from program related documents, student 
records, classroom observations and interviews with program 
administrators, teachers and parents. This description is also the 
most important element used during the data analysis and 
interpretation. It is therefore essential that program personnel pay 
particular 'attention to this component. A partial listing of the 
types of information to be collected follows. An expanded listing is 
included in the Technical Appendix . 

* 
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Descriptive Information on the Instructional Approach 



1. Content of instruction * 

a. Content areas covered 

b. Who determines content 

c. Othe? content features 

(1) Relationship of content to goals 

(2) Articulation of project content with existing 
district curriculum 

2. Presentation of content 

a. Instructional approach 

(1) Type, e.g., concurrent, alternate day/week, 
preview/review half-day, resource room, 
and/or bilingual aide 

(2) Organizational practices, e.g., individualized, 
large group, learning centers, peer tutoring, 
small group instruction, and/or team teaching. 

b. Methodologies for bilingual education 

(1) Language of instruction 

(2) Approach to second language instruction 

(3) Approach to reading instruction 

(A) Approach to other academic instruction 
4. Scheduling 

a. Grouping and regrouping 

(1) Across classes 

(2) Within classes 

b. Daily schedules 



( 
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Identifying the goals of program services and describing variation in 
educational service is a very important part of this activity. In 
most programs, the services vary for different students depending on 
their language skills, reading and math skills, and other factors. In 
such cases, each different service must be ^described separately, and, 
when analyzing the evaluation data, students must be grouped according * 
to the services they received. * 

In describing the bilingual program, it is essential to" describe 
clearly what the students have experienced throughout their 
participation in the program. Therefore, a multi-year description of 
services should be developed. For example, bilingual programs that 
include a coordinated curriculum for grades K-6 must describe the 
.complete program. 

A description of the overall program organization and management is 
the last requirement of the baseline data collection activity. This- 
description will provide the basis for evaluating the operational 
effectiveness of the program. The information should cover the 
following areas: 
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Descriptive Information on the Instructional Program 



1. Staff Organization 

a. List of staff members and time commitment 

b. Organizational structure 

c. Qualifications 

d. Selection procedures 

2. Responsibilities and Roles of Program Personnel 

a. Project Director 

b. Teachers 

c. Aides 

d. Other staff 

3. Staff Development Program 

a* Needs assessment 

Structure of training 

c. Characteristics of Training 

d. Audiences Trained 

4. Parents and Community 

a. Parent involvement in school affairs 

b. Community input in program planning, e.g., through 
advisory group. 

c. Community support for project 

d. Parent education 

e. Parent conferences/counseling 

5. Communication 

a. Staff relations 

b. Relations with nonproject staff 

6. Dissemination of project information 

a. School personnel 

b. Parents and community 
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51 Develop the Program Description 



The amount of data .to be colTected will ~obvi£usly vary from program to 
program. Once the evaluator, in consuTTation with the program 
director, makes the necessary decisions on what information to 
collect, the sources for the information should be identified and the 
proper data collection instruments selected and/or developed* 
Information on each program component will come from several sources* 
These sources may include; 

o The program proposal; 

o Student records and other files; 

o Previous evaluation; 

o The program director; 

o Program staff; 

o Bilingual teachers; 

o District administrators; 

o Classroom administrators; 

o Classroom activities; and 

o Parents. 

Information from these sources is obtained by examining program 
documents and from interviews. Data collection should begin by 
September 15th of the first project year. Data should be updated by 
September 15th of each of the following years as needed to permit 
current analysis and interpretation. 
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Much of the information to describe the program context, goals, and. 
management can' be found in the grant proposal, prior evaluation 
reports, and other related documents. These sources may provide some 
information on the instructional program as well. Student records 
will provide information on the students characteristics, prior 
history, and performance. Worksheet No. 7, can be used to extract 
this information. 

However, a significant amount of information will have to be collected 
from program personnel. Worksheets 8, 9, and 10 are sample interview 
schedules provided to assist the persons conducting the data 
collection activities. Worksheets 7, 8, 9, and 10 are included at 
the end of this narrative section. 

How to Use Worksheets No. 7, 8, 9, and 10 These worksheets are 
designed to gather information from program documents, the program 
director, program staff, and local and district administrators. The 
person conducting these activities can interview the project director 
(Worksheet No. 8), program staff (Worksheet No. 9), and local and 
district administrators (Worksheet No. 10) and record their responses 
to questions that appear on these worksheets. These worksheets can be 
modified to meet the unique needs or focis of individual bilingual 
programs. Once all the interviews have been completed, the 
information should be synthesized to produce a document which provides 
a clear description of each of the four components of the intended 
program as originally described. 
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Each of the questions on these worksheets corresponds to one of the 
four prograip components (context, students, goals, and'the 
instructional program). Each question is coded with a letter which 
identifies which component the question corresponds to. The coded 
letters are: C for program context; £ for students; G for program 
goals; and P_ for instructional program. 

Once the interviews have tjeen conduct ed the person conducting the 
interview can readily provide the information that describes that 
particular subsection of the prograjn component. 

6, Document and Report the Baseline Data 

Once the desired information is collected, attention should be focused 
on the various ways it is to be used* The information: 

o Will be used as baseline information during the 
program monitoring activities of the evaluation 
process; 

o Wil<l provide a partial basis for planning the 
analysis and interpretation of student outcomes, 
as described in Chapter IV; and 

o Will be reported directly to various audiences as 
part of the evaluation reports written for them. 

Immediately after the preliminary data have been collected, the data 
should be summarized in the. form that they will appear in the Final 
Evaluation Report and submitted to the program director for review. 
An initial analysis and interpretation of the data should be conducted 
to determine which variables, if any, are to be used as'a basis for 
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separate analyses. Chapters III and IV provide more detailed 
information on how to conduct the data analysis and interpretation 
function. ' . 
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OATA COU£CT!CN -CRM ?OR iNFORMATlQH ?ROf1 THE 
30 CJ£CT PROPOSAL A NO 37HER ?ECCRDS 



'The arbject proposal alio various project or scnool records snould De reviewed 
:o ootem ttte 'ndicated information. , 

V 

CI *. what are :nt *nfjor project 50a is? - 

linguistically ' - 



Cu I Curat ly 



Acad •mica 1 ly ^ 



IS) 2. .Vhac is the pattern of predominant languages among the student 
poTnrfScicn? / . < / 



4 



3. What is the approxieiace achievement level (in languages, other 
academic-and nonacadem i c areas) of s£ud>nts within dhe various 
language categories? Report seoarateiy for each Unguage grouo. 



Languagt achievement 






4 








> 










:cn«r academic achievement 












\ 
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* * 
C • *efe'rs to program content 
S ■ refers to program students 


- G • rtfers 
P - rtfers 


to 
to 


program goa'^t 
instruct idnal programs 
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WORKSHEET NO. 7 
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Honaceaemic achievement 



»?* 4. «het ^radt < eve Is ind how -neny classrooms ire served by the orojectZ 



(P) 5. Whet portion of the school day is covered? 



(C) 6. Oescribe the following community characteristics 

a. Languages spoken (approximate percentage speaking each language) 



b. Ethnicity (approximate percentage of each) 



c. Socioeconomic status (general deacrioTion based on type of 
em© I oymen t ) \ . » 



d. Si 1*2 of community^ 
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(C) 7. Oescribe trva local eoucacicn agency as follows: 
a* Size^ 



D. Financial status of district 



c, r sc:K:ies availaole 'or oroject 



(C) 3. Describe Che following school characteristics 

\ 

a. Numoer of ol Unguals in school by language group_ 



b. Ntwioar of Students in bi lintel program_ 



c. ti lingual mix .in the classrooms^ 



f ?) 9. Describe the project staff and its organization. List each member of 
the staff, the percentage of time committed to the>^roject and :he<r 
qualifications ^^»^ 



Name 



Percentage 
time 



Qua 1 i fi cat ions 
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WORKSHEET MO. ? 



(page or 5) 



3«$cr;b« :n« organizational structure af :ha proj«ct_ 



What salaction procaduras ara usad in talacting stiff mambars? 



{?) IO.C?scriba :ha project diracWs rola with raspact to tha following items: 
a* Funds and budget s*- 



b. Public relations 



c. Administration 



4. Ovarsaainc instruct ion_ 



1— 
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t. Stiff training 



f. 0«v»iooinq »ne oraaring .natanals »na «quiom«nt_ 



g. Staff racruiting and hiring_ 
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(page 1 of 7) 



PROJECT 3 1 RECTOR INTERVIEW SCHEDULE 



1 (G) 1. Tha goats of the program as statad in the proposal are as follows: 
(Present tha gctfls orally or in writing as ootained from the pro- 
pose!.) 



What evidence will snow thee these goals have been met? 



Which goals have the highest priority^ 



(G) 2. Ho* would you define the project as to the extent which it is a 
ma i men a nee , transitional or partial bilingual program? 



(C) 3. Oescnba the mobility of the cowjunity including any specific data 
avai lao It 



C • refers to program content 

Z » refers to program goals 

S • refers to students 

? » refers to instructional program 
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(P) fc. Mow are students assigned to classrooms?^ 



SJ 5. describe :ne student entrv inc *xit criteria ana procedures. 3o trie 
actual procedures conform to the planned procedures? 



(P) 6. Describe the scneduling of instruction including daily schedules and 
grouping and regrouping across and within classes 



(P) 7. Oescriba tha staff and its organization in terms of the following 
dimensions 



a. Staff members 1 time commitments 



b. Staff organizational structure_ 



Staff qualifications 



d. Staff selection procedures 



ERLC 
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(P) 8. What is vour gtnanl l«»d«rship styl* is orogram dirtctor? 



! 

I 
I 



(?) 9. What is your role as program director with resoect to each of'tha 
fol lowing areas? 

a. Funds and budgets 



b. Puol ic relations 



c. Administration 



d. Overseeing instruction^ 



Staff training 



a. Oevelooing and ordering materials and equioment 



(page k of 7)- 



?. Staff racruiting ind nirlnq 



CP) 10. what is :m '.uctitr's rola In tna following araas? 
a. Planning instruction 



a. Implcntnt ing instruction 



c. Hon instructional rtspons ibi H tits 



Q 

(P) II. What is the rola of thm aidas in tha program? 



(P) 12. 'What is tha rola of ochar stiff members such as th« following? 
a. Instructional coordinator 



b. Community coordinator 



c. ^valuator 

■ I 
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d„ Otber (pitas* specify) 



e, Oth«r (please soeci'v) 



\P) 13. Describe tMe proa ram 1 s staff deva lopment acti vi t ies related to 'the 
following aspects. 



a. Needs assessjnent 



T" 



b. Structure of training (prt-service and in-s*rvica)^ 



c. Characteristics of training I 
> 

(1) Aoorooriateness for staff of differing levels of knowledge 
and expert tnce „ 



JL 



(2) Practicality 



(3) Coordination wit.t degree programs^ 



1:5 
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(M integration *icn otner training^ 



Au4i«ncts ;ram«o \ arogram ana/or nonorogram staff )_ 



(P) JU. Qeecribe th* invoWamtnt of cr* community and parents with respact 
to tha following items. 



a. Parent involvement in school affairs^ 



b. Community incut in program planning^ 



c. Evidences of community support for tha program^ 



d. Parent education 



e. Parent conferences/counsel mg_ 



ERLC 
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* 

lP) »5. Describe the -Beans of communication of the following grouos. 
a. Among program staff 



■j. a-ogram uaff «itn :ne "ol lowing nonorojec: staff: 
!) Vtnocals . 



(2) Other district administrators 



(3) Nonorcqram teachers 



{U) School board 



(?) 16. What .Tnan.5 are ustd to disseminate project information to school 
personnel, parents and community? 
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3<GGRAJ1 STAF? INTERVIEW SCHEDULE 



Project staff 



Bilingual teacher 



(Check one) _ 

(G) what »$ tha intended content of instruction (i.e. tfra theoretical 

curriculum) with r«pect to tne 'ollowing :natters? 



*. Content areas covered 



o. Relationship of conc«nt to project goals_ 



c. Who datermines the content?_ 



d. What articulation is there between project content and the 
extant district curriculum? 



(?) 2. Oescribe the presentation of contant witn raspac: to the following 
i tints. 

a. Type of instructional mode^l or thaory (a. 9. concurrtnt, alternate 
week/day. previ«w-review, 'half day, resource room, and/or 
bi lingual aida) 



C rtftrs to program content 

S ■ refers to students 

G • rtfers to program goals 

9 • refers to instructional orogrim 



a 
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3. Organisational practices (e.J. individual i sed^ Urge group, 

learn mg centers , peer 'tutoring, small group instruction, and/or 
te«n teaching) 



(?) -3* Oescno* tne methodologies emoloved for bilingual education with 
r«sptct :o the following items. 



a. Language of instruction 

(l) General language use plan of teacher and student over length 
of proc-ram 



(2) Daily instructional time in eacn language 



13) Variations for different student groups 



(*) Criteria for establishing language of instruction 



I 
i 
i 
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i 

b. Approach co nonstandard forms 
(1) Acceptance 



[2) ?ora of cnrrtccions 



c. Approach co second language instruction 
(1) formal instruction 



(2) Functional use of second languagt for conttnt instruction 
and other acti vi cits ' 



d. Approach to reading instruction 

(I) languagt in which students Itarn co rtad 



(2) Criteria for beginning reading in second language* 



{»*) ^..Describe che specific instructional nethooologies used in eacn 
suoject area 



is 



"J 



Worksheet no. 2 



(paae u of 




{?) 5. Oescfibe chose aspects of the program that are intended to motivate 
students and improve tneif self-concept with respect to tne .oi low- 
ing matters: 

1. appropriate content inn language of instruction 
(!) Using ror nst ruction ; 



Vi 



(2) Accepting language of tne student^ 



(3) Content that relates to experiences of students p?. 



\ (k) Culturally relevant materUl ^ 



b. Improved affective climate • 

(1) Placing couaPvalue- on both languages and culture^ 



(2) rnsuringf student success_ 
A. 
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(3i Involving parent s_ 



(M Teacher «s a role model 



c. Discipline approach 
( I ) Ph i losopny 



(2) Guide Hr.es/approach tc control 



(3) Special reward syitams (e.g. prizes andorivi lagas)_ 



(P) 6. What materials are used within each of the following categories? ^ 
a. Core materials in use * 9 t 

(1) Commercial . 



t 

(2) Locally developed 
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b. Aoproer lateness 
(1) Linguistic^ 



Z) Cultural 



-7. Cescrioe the role of each of the following personnel in the clessroon^ 
a. Teachers . * 



b. Aides 



c. Partrits 



d. ?eers> 



t. Resource staff 
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(?) 3. Describe :he srogram diraccor's v*or'< cn respect to tne following* 
4. Leadersmp style — _ 



* d. *ole or "esponstbi I i t »es in connection *itri eecr. of tna ro I lowing 

Mil"" » A 

!• r uncs and auagets 



(2) Public relations 



O) Administration 



(h) Oversee; ng instruction 

-* 



(5! 


Staff training 










■(6! 


Developing and ordering materials and equipment 







(7) Star' recruiting and hiring 
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WORKSHEET NO. 10 ■ 
: : 1 



(page 1 of 2) 



tuCAL AND DISTRICT ADMINISTRATORS INTERVIEW SCHEDULE 



(G) I. Describe the school district 1 ** general goals_ 



' k C5 I. *hac Is tha scnool sistrict's snilosoony toward language ana cul- 
tural diversity? 



(P) 3. To wftat •xttnt is there articulation of project conttnt wich the. 
existing district curriculum? 



(?) k. What is tha relationship between theVoject stiff and each of the 
following categories of district personnel? Corrment specifically 



on 
a. 


program, acceptance. 

Principals 
























* 














b. 


Central office administrators 


























c. 


Monnroitet teaehers 


















d. 


Th« school board 


















■ — — - f 



C ■ refers to program context 
S » refers to students 



G ■ refers to program ^oals 
p m refers co instructional 



proorar. 
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•worksheet no. to 
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Describe :n« 1 i 5s«nii rut ion of orocrtm 


Information :o tfie *o Mowing 




:wo grouos. 






* . Schoo i ?trsonnt 1 


— — r- 1 
















o. Paretics and che coiwunicy 












/ • 
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CHAPTER III 

CONDUCTING THE EVALUATION OF PROGRAM OPERATIONS 



The successful completion of the planning activities and the 
establishment of the Baseline data for the evaluation enable the 
program director to initiate the actual evaluation of the bilingual 
program. As described before, the actual evaluation of the bilingual 
program' take.s two thrusts: ' the evaluation of program operations and 
the evaluation of student- outcomes. Thes-e ma_y be viewed. as totally 
separate activities. - However, the outcomes or outputs of both 
activities-are used during the analysis function to interpret the 
overall evaluation results and formulate recommendations for changes 
in thfe program. T.his chapter presents guidelines and procedures for 
conducting one part of the evaluation, the evaluation of program 
operations. « 

The evaluation of program operations employs the di scr'epancy 
evaluation design described earlier. Therefore, in simple terms, the 
evaluation of program operations is performed by first establishing 
the baseline data on the pr.ogram. This activity was hopefully 
accomplished in accordance with the - recommended procedures in Chapter 
II. Most importantly, this activity should have been completed during 
the first or, at least, by the end of the second mon'th of the program 
year. The second activity required to perform this facet of the 
evaluation is to collect 'anothe r set of data similar to the baseline 
data. Decisions on what data to collect, how* and when to collect the 
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data, and who will collect the data have already been made during the 
planning phase of the evaluation activity (See Chapter I). Most of 
these data are collected by monitoring classroom activities and 
interviewing various persons associated with the bilingual program. 
This set of data, decribing actual program operation (e.g. , s the 
instructional method being used; the amount of instruction in English; 
the number of teacher aides assigned to a class, etc.) is compared to 
the baseline data collected a,t the beginning of the school year, which 
describes the program design. * The comparison .provides the basis for 
determining if the program was operated as planned. If this is the 
case, there should be faw or minor discrepancies in the two set's of 
data which* describe the program. , If the comparison reveals- 
significant discrepancies or deviations, the evaluation must document 
why Miis occurred. 

Discrepancies in program operations should not necessarily be viewed 
as a negative finding, since there are many reasons why a program may 
deviate firom its original design. This information, however, is very 
important in determining if this deviation impacted the instructional 
program in such a way that it affected student performance. For 
example, the program may h^ve been designed to. provide one hour of 
instruction in social studies using the student's native language. 
However, due to" scheduling modifications, teacher shortage or other 
factors, a change was made during the fourth month of the program and 
instruction in the home language did not occur. The evaluation, 
nevertheless, was designed to assess student performance in social 
studies. The resulting student outcomes could show that progress was 
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minimal. An immediate conclusion would be that the program failed. 
However, knowing that instruction in the students' native language did 
not occur, the program director and evaluator can explain the 
resulting student outcomes. The question(s) to be addressed, then, is 
why was the program design changed? Should the original design be 
reinstated?. Does performance data from students who-received 
instruction in their native language show achievement? Answers to 
these, and other questic.-.s begin to' formulate a'set of • findings and 
recommendations .for the improvement of the overall program. This 
'interpretation activity also begins to merge and integrated the two 
types'of data from the evaluation. 

While the example above ties the evaluation of program operations to 
the evaluation of student, outcomes, it should be clearly understood 
that the primary purpose of this part of the evaluation is to examine 
and monitor the' manner in which the program is being implemented. 
Additionally, the discrepancy evaluation design makes no attempt to 
infer or determine program impact. 

This chapter, therefore, describes procedures for evaluating the 
instruction, staff development ,' and parent involvement components of 
the bilingual program. While there are other facets of the program 
'" operations that merit attention, these components are the most 
significant to the overall operation of the program. The level of 
effort allocated to the evaluation of each of these components depends 
upon its emphasis and/or importance to the overall program, as 
established during the priority setting activities of the planning 
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process. These issues should be addressed -and resolved by the program 
director and evaluator in planning and designing the evaluation (see 
Chapter I). 

Most of the activities required to evaluate the program operations are 

conducted throughout the program year rather than at one time. They 

can therefore be properly planned and scheduled, taking the other 

responsibilities of program personnel into consideration. Program 

personnel shpuld be aware, however, that v as v.arious activities of the 

evaluation pro.cess begin to feed data into the analysis activity, 

♦ 

analysis may become taxing if not planned properly* The program 
director, with assistance from the evaluator, must therefore schedule 
the analysis, interpretation, and reporting activities with this in 
mind. 

The guidelines and procedures recommended in this chapter, in 
conjunction with those in Chapter II, may appear to be overwhelming in 
light of the program's resources. In reality, the prescribed 
procedures should be able to be conducted well within the resources of' 
the program* The Handbook recognizes the fact that most bilingual 
programs, in addition to their personnel, only have an average budget 
of approximately $2,000 - $5,000 per year to secure the services of 
independent evaluators. The baseline data gathering activity may 
require 'extensive effort the first time that it is performed, however, 
updating the data for use in subsequent years should not require a 
great amount of time. The majority of the evaluation activities, if 
properly planned and scheduled, should be able to be performed by the 
program personnel. 

' I9n 



.GUIDELINES FOR CONDUCTING THE EVALUATION OF PROGRAM OPERATIONS 
1. Evaluate the Program Instruction Component 



The- evaluation of the instructional program is intended to answer the 
following two questions: 

\. Are planned inst cue tional methods actually, being 
used? " 

« 

2. Are changes' needed in the instructional methods? - . 

Data needed to answer these questions are obtained by observing 
classroom activities and - n t e r v i'ew i ng program teachers and 
administrative staff. This core of information is then compared to 
baseline information, obtained through activities described in Chapter 
II (Worksheets No.. 7 through 10), in order to determine if the program 
is operating as intended. The program director is assumed to have the 
primary responsibility 'for conducting activities that monitor program 
instruction. Therefore, the program director will need to fine-tune 
•the recommended procedures and worksheets to ensure th^ unique needs 
and intents of the bilingual education program are met. The 
instructional program is the core of the bilingual program. The 
program director must ensure that the level of effort allocated to 
evaluate this activity is appropriate.^ 

Information on operating instructional programs is obtained by (a) 
conducting classroom observations, (b) interviewing the teachers whos* 
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classrooms are observed, and (c) conducting supplemental interviews 
with a sample of program teachers and administrative staff. Each of 
these, activities is discussed below. * 

Conducting Classroom Observations — Prior to observing the classroom, 
the program director should review the program description so. that 
program features which satisfy the goals and objectives can be 
observed. The features to be observed should be identified during the 
planning process. The program description is part of the baseline 
data identified in Chapter II. Classroom observations should become a 
planned activity of the program director. Following each informal 
observation, the program director s-houLd prite a summary of the 
classroom instruction as it was observed. These brief summaries 
should be synthesized into brief reports at least three times during 
the year fall, winter, and spring. Later, these brief reports 
should be used during the comparison activity and incorporated into 
the final evaluation report. Thus, over time, the program director 
develops a complete picture of how the classroom instruction is 
actually being performed. Quality information can only be acquired 
from frequent, informal classroom visits, not from a few structured 
observations • 

« 

Topical areas that should be observed by the program director will, of 
course, depend on how the particular program is designed. Some 
general categories or features to observe include: 

. o Language use; 
o Content of thtf lessons; 
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o Teaching methods; 

o Diagnosis and grouping of students; 

o Recordkeeping; 

o Staff roles in^the classrooms (teachers and 
aides); 

o level of participation by students; and 

o Attitudes and general morale of-the students. 

Worksheet No. 11, which follows, will assist the program director to 
develop a precise picture of classroom instructional activities. This 
worksheet should also be used by the program evaluator in conducting 
observations. The evaluator, who will have less time to spend in the 
classroom, should conduct several observations to see classrooms in 
operation. These observations can be informal or more structured 
depending on the need of the evaluation. These informal visits will 
provide a relatively unbiased outsider's perspective that is an 
insightful supplement to the program director's observations. In 
addition, this information will be beneficial to the evaluator in 
preparing the final evaluation report. 

How to Use Worksheet No. 11 — The Classroom Observation Schedule 
(Worksheet No. 11) is designed to collect information aboutr 
instructional methodologies employed; amount of time instruction is 
conducted in each language; variations for different student groups; 
rate of presentation; indicators of self-concept development and 
motivation, and the role cf the various classroom personnel. This 
worksheet, when completed by 'the program director- or evaluator 
conducting the observation, will provide information about program 
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instruction which when compared to intended program data will form the 
basis to determine what changes have occurred in the program, as well 
as provide information with which to make decisions for program 
improvements. 
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CLASSROOM OBSERVATION SCHEDULE 



Date: 



Class Hour: 



Instructor:^ 
Observtr: 



I. List the content areas 
covered during the class 
period as they occur. 



1. 


time 


started: 




time 


ended: ' 


2. 


time 


started: 




time 


ended : 


3- 


time 


started: 




time 


ended : 


k. 


time 


stir ted: 




time 


ended: 


5. 


time 


started: 




time 


ended: 


6. 


time 


started : 




time 


ended : 


7. 


time 


started : 




time 


ended : 


8. 


time 


started: 




time 


ended : 


9. 


time 


started : 




time 


ended: 



II. List the instructional 
methodologies employed as 
they occur during the 
period: 



Summary statement (enter at end 
of period) : 



III. The beginning and ending time for each of the instructional components 
of the close period can be indicated in item t above. In addition the 
observer can indicate here estimates of how much; time fell within each 
of three categories during each three minute segment of the class 
period. 



Three 




On-task 


On-task 


Three 




On-task 


On-task 


Minute 


Off-task 


Students 


Students 


Minute 


Off-task 


Students 


Students 


Period 


Time 


Active* 


Passive 


Period 


Time 


Active* 


Passive 


1 








11 








2 








12 








3 








13 








k 








14 








5 








15 








6 








16 








7 








17 








8 








18 








9 








19 








10 








20 









* One or more students engaged in behavior for which they gat feedback from 
the teacher. 
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0 


IV. Describe any variations in 
teaching approach used for 
diffartnt student groups 
(include any variations in 
pece of instruction for in* 
dividual* or groups) 


V. Describe any evidence of self- 
concept development and mo- 
tivation including indicators 
of (a) accepting the language 
of the student and (b) con- 
tent that relates to the 
experience of the students 


Summery statement (enter at end of 

period) 


Summary statement (enter at end of 
period) 


VI. Describe the role. of all of the following personnel who war* present 
in the classroom. 

(1) Teachers: 

0 


(2) Aides: 




(3) Parents: 




ik) Parents: 




(5) Resource staff: 
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Conducting Teache: Interviews — Interviews with the teachers whose 
classes were observed may answer questions of whether instructional 
methods have changed from the original planned instruction, the 
reasons for the changes, and what changes in instructional methods may 
be needed. Worksheet No. 12 provides a sample interview schedule to 
use in conducting these interview? with teachers. 

How to Use Worksheet No. 12 This worksheet contains a series of 
questions which help to direct the teacher interviews and may be used 
by the program director or evaluator to interview the teachers whose 
classrooms were observed. Within a week after each of the classroom 
observations, interviews should be completed by the individual who 
conducted the classroom observation to ensure that the interview is 
focused on the particular methodologies the teacher employed and the 
manner in which these methods were utilized. 
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(Page 1 of 1) 



PROGRAM OPERATIONS INTERVIEW SCHEDULE FOR TEACHERS 



1. What are the major instructional methods that you employ? 



2. Why do you use these particular methods, i.e. are these particular 
methods directed to particular instructional objectives? 



3. Are there other instructional methods that you would prefer to employ If 
it were not for various circumstantial constraints that you face? 



k. If so, what are these constraints? 



5. What program changes would you recommend that would facilitate your 
efforts to provide the best instruction possible? 



6. How typical would you say the class period that we observed was in terms of 
the instructional approach used end the nature and amount of Interaction 
with students? How was it atypical? 



7. How do the entry and exit criteria and procedures actually used differ from 
those planned for the project? (Interviewer: It pr%pT%6 to describe the 
planned procedure* 
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Supplemental Data Collection — In establishing the baseline data 
(Chapter II), interviews were conducted with program personnel, 
parents, and district personnel. Worksheets Nos. 8, 9, and 10 were 
used to guide the interviews and record the responses from the 
individuals about their understanding of the intended goals, 
audiences, and activities of the bilingual program. A similar set of 
activities need to be undertaken to identify information about the 
actual operation of the program. Thus, the final step in evaluating 
the instructional program is to interview a sample of program 
personnel, parents, and local and district administrators. 
Information obtained from these interviews becomes a direct link to 
the interview data used in establishing the baseline data. Comparing 
these to sets of data is crucial in identifying discrepancies which 
guide program improvement. 

The program director should plan to re-interview a sample of program 
personnel as well as local and district administrators to elicit 
information about actual instructional operations. Worksheets Nos. 9 
and 10, when slightly modified, can be used as a tool to guide the 
' interview and record responses. The program evaluator can modify and 
use Worksheet No. 8 as a tool to re-interview the program director. 

Once the interviews have been completed, the information should be 
synthesized by the program director and evaluator. This information 
is then compared to the baseline data so that discrepancies between 
" "planned" and actual program "operations can be noted. 
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Analysis of Program Instruction Data — A determination of whether or 
not the instructional component of the program is operating as 
intended is made by comparing baseline information about the design 
and plan of the instructional program (see Chapter II) to the 
information acquired from the evaluation of program instructional 
activities. This comparison leads to the identification of 
discrepancies between intended and actual program operations. Noted 
discrepancies identify areas or issues which may require decisions to 
correct the discrepancies. Later, these discrepancies may also be 
taken into account in the interpretation of student outcome data if 
the changes in the instructional program are determined to have 
influenced student performance, The triad of intended 
operations/instruction data, actual operations/instruction data, and 
student outcome data forms the basis for identifying final 
recommendations for the evaluation report. 

Interpretation and Use of Results — The results of these analyses is 
presented to those persons responsible for decisionmaking. The 
program director reviews and analyzes the data to determine if either 
immediate or future changes should be sought in the program operations 
and instructional methods employed. Frequent and immediate reports to 
the program staff should be provided by the program director. Such 
reports enable staff to review the intended changes, identify means of 
implementing the changes, and, consequently, be a part of the program 
improvement process . 

2Qil 




\ 



Additional interpretation is performed by the evaluator. Using data 
from the various sources, the evaluator can examine the triad of 
intended instruction, actual instruction, and student outcomes to 
recommend changes which should be sought and ways to implement these 
changes . 
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2. Evaluate the Staff Development Component 

The evaluation of the staff development activities of the 
instructional program compares the actual training provided to 
teachers to that. which was planned. The comparison provides 
decisionmakers with information about what training actually took 
place and how this training is related to the intended goals of the 
program, as well as whether the training met the needs of the program. 
Specifically, the evaluation of the staff development activities 
answers the following questions. 

m 

1. Were the staff development activities conducted as 
planned? 

2. Did staff training activities meet the needs 
identified at the onset of the program? 

3. Did staff participants acquire the intended 
knowledge and skills? 

4. Were staff satisfied with the training provided? 

5. Were skills acquired through training implemented 
in the classroom? 

Answers to these questions when compared' to the baseline information 
(Chapter II— Worksheets No. 7 and 8), will identify discrepancies 
between actual staff development activities and intended staff 
training, as well as provide information on the actual training. A 
variety of data collection methods can be employed to obtain the data 
needed to answer the above questions. Methods such as questionnaires, 
knowledge tests, and observations of instructional techniques can be 
used to provide the necessary information. 
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Questionnaires — Information regarding satisfaction with and outcomes 
of staff training activities can be obtained by questionnaires 
completed by the program director and staff. Worksheet No. 13 
provides a sample questionnaire which can b.e used to collect 
information on the actual staff training activities. 

How to Use Worksheet No. 13 — The Staff Development Questionnaire,- 
Worksheet No. 13, should be administered to the staff being trained by 
the person(s) responsible for conducting the evaluation of the staff 
training activities; This questionnaire provides information about 
the type and duration of trainings numbers of program staff involved 
in the training; and planned and unmet expectations and objectives for 
the training. This data should be collected within one week following 
the completion of all training activities which occur throughout the 
program year, or at the very least, near the end of the program year. 

Appropriate analytic methods for analysis Questionnaire data are 
determined by the form of the data. The evaluator or appropriate 
member(s) of the program staff should review the questionnaire 
responses and systematically categori-ze the information ccording to 
the evaluation questions posed. 
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$TAF* DEVELOPMENT QUESTIONNAIRE 



Kama of training activity^ 
Diet of training 



Nam* of person completing questionnaire (optional )^ 



J. In general, wnat txotctations did you have for the staff training pro- 
vioed as part of this project? 



2. To wnat extant were these axoectations mtt?^ 



3. Based on your knowladga of the objectives for this staff training, which 
objactivas do you chink havt baan mat? 



k. Which objactivas do you think hava not baan m«t?_ 



1 
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Knowledge Tests -.A more immediate source of information on the 
impact of staff training is information derived from administering 
knowledge test's to trainees, during or at the end of the training. 
These tests, devised by the- instructors, should focus directly .upon 
the instructional content of the training. Because of the specificity 
of such tests, no sample instruments are included in this manual. The 
results of the. knowledge tests can be" examined from one or more 
perspectives.. The tests could be administered pr'ior to and 
subsequent to training, thus allowing comparisons to be made between 
pre- and post-test, scores. An alternative approach would be to use a 
control group not involved in the training program as a" basis for 
comparison. An additional comparison could be made between the test 
results and the stated objectives of thd training program. 

Ohservation of -in.* rational Techniques - The classroom observation 
process, should yield information on the instructional approaches that 
are actualiy being used by teachers. To ^the extent that staff 
training is expected to affect -instructional approaches used by 
teachers, the data acquired from the classroom observations are also 
pertinent to determine whether .or' not the training accomplished its 
pu. poses and is being implemented as planned. For example, it may be 
possible to determine if staff development activities intended to 
provide teachers with skills that are to be used in the classroom 
(such as how to use new materials, or administer .tests) were 
successful by observing the teachers in the classroom. ^ 
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Classroom observation data should be analyzed according to procedures 
described earlier in this chapter in order to identify discrepancies 
between intended and actual staff development activities. 
Specifically, the major goals of the staff training which pertain to 
teachers' instructional approaches (Worksheets No. 7 and No. 8) should 
be compared with actual classroom practices as evidenced by classroom 
observation data (Worksheet No. 11). 

Interpretation and Use of Results — The program director should 
examine the results of the analyses described above and tfltermine if 
the goals of the staff training were met, as well as determine if 
findings related to staff training can be issued periodically 
•throughout the program year, possibly in conjunction with recommended 
changes in program instructional operations. Program personnel then 
will be able to provide reactions to the recommended changes and 
identify possible approaches for implementation. 

• 3. Evaluating the Parent Involvement Component 

The evaluation of the parent involvement component should address four 
questions. These questions are: 

To what extent did the level of parent involvement 
match the planned level? 

Were parents satisfied with their level of 
involvement? 

Was the program staff satisfied with the level of 
parent involvement? 

To what extent and in what ways has parent 
involvement changed over the life of the program? 

11-100 
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2. 
3. 
4. 
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Data collected and used to answer these questions when compared to 
information about the planned level of parental involvement, 
identified in Chapter II (Worksheets No. 7 through No. 10), should 
identify if discrepancies exist/ Data needed to answer these 
questions can be gathered by conducting a variety of interviews. 

Parental Involvement Interviews An comprehensive interview should 
be conducted with the individual most knowledgeable about parent 
•activities. This person could be the program director, parent 
activities coordinator, principal, or some other staff member. The 
Interview Schedule for Leader of Parent Activities (Worksheet No. 14) 
can be used to conduct this interview. This worksheet can also be 
used to elicit information from a sample of program staff and 
administrators about actual parental involvement activities. In 
addition, several interviews should be conducted to obtain information 
from a sample of parents whose children are involved in the program. 
Worksheet No. 15 provides a sample interview schedule for conducting 
these interviews. 

The Parent Interview Schedule (Worksheet No. 15) provides a sample 
interview schedule for conducting parent interviews either in person 
or by telephone. The evaluator should select a representative sample 
of parents to be included in this evaluation activity,. Parent 
involvement interviews should be conducted during the last few months 
of the program year. 



:*J7 
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How to Use Worksheets No. 14 and 15 — These worksheets provide 
guidelines for interviewing a sample of program staff and the 
individual most knowledgeable about parent activities. Depending upon 
the program's information needs, certain questions can be pursued with 
more or less detail and others can be omitted. It may be desirable to 
add additional questions which assess the degree of involvement and 
satisfaction of parent! with the program. 

Analysis of Collected Data, — The program director or evaluator should 
analyze the data through a simple process of categorizing responses* to' 
open-ended questions, and recording simple averages and tallies of the 
frequency of various activities. These data can then be used in 
subsequent interpretations. 



Interpretation and /Use of Results r - Data interpretation should 



Advisory Council chairperson, and key staff should review the data and 
identify changes to be made for the upcoming program year. 

The program director and evaluator should report findings to the staff 



periodically throughout the year, along with any recommended changes 



in program operations. The staff's reactions and suggestions should 
then be solicited so that the desired changes can be made through 
mutual endeavor. 




provide a thorough description of current activities and compare 
actual parental activities *to previously determined goals. A 
consistent and compatible set of recommended changes and future goals 
can then be established. The evaluator, program director, Parent 
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INTERVIEW SCHEDULE -OR LEADER OF PARENT ACTIVITIES 

What is :ne general scoot of parent involvement which was planned for the 
project this year? 



2. To «hat extent have these goals changed since the beginning of the project 
year? 



3. To what extent have these goals been met?_ 



k. Are you satisfied with the level of parent involvement? Is the staff as a 
whole satisfied? 



5. To what extent and in what ways has parent involvement changed over the 
1 i *e of the project? 



>. What are the most positive aspects of parent activities?^ 



7. What aspects of the parent involvement have the most potential for 
improvement? 



3. What changes are you recommending be made in parent activities in the 
futurt? 



|er|c 
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PARENT INTERVIEW SCHEDULE 
(C) '1. To what txttnc havt you bttn involvtd in school affairs? 



(?) Z. To wnac txttnc art you awara thac school has gocrtn 
suggestions and reactions front tha communis in planning 
ics o i Ungual aducacion program? 



(C) 3* How much coawiunicy suoporc do you btliavt chart is for cht 
bilingual education projtct? 



(•) k. How much tducaclon has tha school dlscrlct provldad for you 
as a partnt as pare of cha bilingual tducaclon projtcc7 



(P) 5. To wnat txttnc art you awart chat tha school has provldtd 
partnt counseling or cenfarancas? 



(?) S. What information hava you ractivtd about cht bilingual 
tducatfon projtct from tht school district? 



(P) . 7, Tht bilingutl program has as ont of its goals (fill In cht 
goals rtlattd co partnt involvamtnc) • To whac txttnc do you 
think chis got! has bttn mat? Whac tvldtnct do you know of 
chac indlcacts chis got! has bttn mat? — 
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4. Analyze and Interpret Program Operat ions Data 



The analysis and interpreting of program operations data is a 
straightforward comparison activity. The evaluator simply examines 
and compares the information collected on the actual operation of the 
program to the baseline information describing how the program was 
meant to operate. For example, if the goal of the program was to 
provide instruction in all academic subjects using the native language 
of the students, the analysis function, using the second set of 
information, simply ascertains if this indeed occurred. If the goal 
was met, the analysis activity documents this. If the instruction did 
not occur, the analysis activity also documents this and should 
attempt to ascertain what caused the change in the program design. 
Both types of findings are recorded and reported in the overall 
evaluation report. This type of comparison analysis is ail that is 
needed by this component of the evaluation. 



Interpreting the findings or attempting to find an association between 
the findings of this facet of the- evaluation^ to the results obtained 
from the student outcomes component should be performed very 
cautiously. The two sets of information are not meant to be 
"scientifically merged" in accordance with sound methodological 
evaluation'practices. However, an alert and perceptive evaluator may 
be able to develop some "intelligent perceptions" about the program 
based on the two sets of information. For example, knowing that 
history was taught using the home language in the fourth grade, but 
not in the fifth, the evaluator may want to closely examine the 
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student outcome data for these two grades. If the data from the 
fourth grade students shows significant higher achievement than that 
of the fifth graders, the evaluator can highlight this fact and then 
present a "professional opinion" suggesting that the instruction ih 
the native language fostered this difference in achievement. 

5. Report the Evaluation Results j 

The information resulting from the evaluation of program operations 
should be summarized, written, and presented in the format in which it 
will appear in the Final Evaluation Report . The format for reporting 
the results will most likely be the same used to establish the 
baseline data. However, the reporting should contain a section' on the 
evaluation findings and the recommendations being made 'to improve the 
program. 
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CHAPTER IV 

CONDUCTING THE EVALUATION OF STUDENT OUTCOMES 

The most important goal of any educational program is to improve the 
performance of the students enrolled in the program. Therefore, 
determining student outcomes is perhaps the most important part of a 
program evaluation. The purpose of this chapter is to describe 
procedures for evaluating student outcomes. The student outcomes to 
be evaluated can be divided into the following four areas: English 
(L2) language skills; non-English or first (LI) language skills; 
academic achievement (e.g., in science, social science, and* 
mathematics); and affective areas of student performance. 



Conducting an evaluation of student outcomes is neither very technical 
nor complicated if the evaluation is designed to simply describe 
student performance. A student per formance* evaluation is interested 
only in determining how the students in the program performed, rather 
than determining what caused the observed level of performance. An 
attempt to measure the latter requires a more comprehensive evaluation 
design than the former. These two different approaches to the 
evaluation of student outcomes are commonly referred to as evaluations 
of student performance and program impact or effectiveness. The terms 
program impact and program effectiveness are. used interchangeably in 
this Handbook. 
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These two types of evaluations are widely confused when conducting 
evaluations of most educational (bilingual and other) programs. In 
particular, many evaluation reports make statements about program 
impact or effectiveness when actually they have only measured student 
performance. That is, they have observed that students have done 
better (or worse) than some standard or comparison group and then have, 
taken the unwarranted step of concluding that the program was 
responsible. This distinction is so important for those who ^lan to 
use evaluation results that a discussion of these 'two types of 
evaluation is presented below. 

Evaluating Student Performance 

Evaluations of student performance and evaluations of program impact 
are both based on the same kinds of measures such as tests scores or 
other quantitative measures, such as attendance rates. In both types 
of evaluation, student scores are compared to some scale or standard 
to give them meaning. Evaluations of student performance usually 
group student standards of performance into two categories. Those 



are: 



Absolute standards of performance which compare 
performance such as: 

Comprehension level (of textbooks, 
newspapers, job application forms, 
etc.); 

lL Mastery of specific skills such as 
language, math, or science; or 

Proportion of days present in 
school . 
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These standards of performance are measureable in absolute terms* 
That is, they provide information on what a student can 01 cannot do 
and are not compared to any other external criteria. 

o Relative standards of performance (typically 
reported as percentile ranks or standard scores) 
may compare student performance against: 

Norm groups (National, State, and 
local); 

Other bilingual students (National, 
State, and local); 

Groups of non-bilingual students in 
the same school or district; or 

Bilingual program students in 
previous years. 



These, of course, are only examples. There are many other comparisons 
that can be made. However, the more comparisons made the more 
technical the evaluation becomes, often resulting in inappropriate 
comparisons and misinterpretation of results. 

Measuring absolute performance is often suggested as a solution to the 
many problems of evaluation, since absolute performance levels are 
supposed to indicate whether the students learned what w. 2 s expected of 
them. Measuring absolute performance, however, is difficult because 
reliable tests are difficult to develop and criteria for success in 
academic areas are largely arbitrary. Nevertheless, absolute measures 
have an important role in evaluating bilingual programs, especially 
when testing bilingual students in their first language since 
appropriate comparison groups may be difficult to find. 
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Relative performance measures are probably the most common measures 
currently used to evaluate bilingual and other education programs. 
Standardized tests are the most widely used for this purpose because 
they enable Comparisons, in the forms of percentiles and standard 
scores, to be made of local student performance to that of a 
nationally representative norm group. However, locally made tests, 
attendance records, and virtually any other measures can also be used 
to compare bilingual students performance to other students in the ^ 
same district or school. 

Relative measurement, like absolute measurement, also requires 
adequate tests. However, relative measurement can be thought of as 
going a step beyond absolute measurement because ifc uses performance 
data from comparison groups, which provide criteria for success that 
are not completely arbitrary. Therefore, relative performance 
measures can be used to measure performance in English skills and 
academic subjects . taught in English. However, using these measures in 
bilingual program evaluations is not without problems, ere is a 
real danger of making unreasonable comparisons between the comparison 
group and the students in a bilingual program, resulting in 
unreasonable conclusions. For example, it may be useful to compare 
the English reading skills of a group- of low r income bilingual program 
students to those of a group of affluent native speakers of English 
from the same district. Assuming that the bilingual students scored 
lower in reading, it would not necessarily mean that the bilingual 
program had failed or that the bili.igual students could not learn, 
since low-income groups tend to score lower than affluent groups even 
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where no language difficulties existP Some evaluator, however, may 
arrive at the opposite conclusion. 



Measures of relative performance should be the backbone of student 
outcome evaluation^ measuring English-language skills and academic 
subjects tested in English. Performance in other languages, generally 
must" be measured in absolute terms because meaningful comparison 
groups will be difficult to find. * 

Evaluating Program Impact 

Although determining the level of student performance should be the 
primary goal of most program evaluations, many evaluations attempt to 
go beyond this to demonstrate that the program is eff ective and 
responsible for the observed level of student performance . Explicitly 
or implicitly, this question of program impact underlies most 
evalation designs. This Handbook recommends that bilingual programs 
•do not- attempt to conduct. an impact evaluation. 

Demonstrating program impact requires documenting evidence that the 
program and nothing else was responsible for the student outcomes. 
This is more difficult than it appears. To do this, the impact 
evaluation design- must immediately address and be able to answer two 
questions. The first question is what constitutes the "program'^ and 
the other is how the students would perform without the program. Most 
evaluations, however, never define exactly what the "program" 
includes. Implicitly, the program may be treated as the sum total of 
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all the methods, materials, teachings, community factors, and other 
things that affect the students. 1. the evaluation is trying to 
determine whether the specific features (methods, curriculum, use of 
two languages, etc.) of the program are effective, then the definition 
of the program becomes very important. For example, some research 
(and the intuition of many educators) suggests that the teachers are 
the most important part of a successful program and that specific 
materials, methods and so on make much less difference. Therefore, an 
impact evaluation design must be able to differentiate results 
emanating from the methods and materials on the one hand, and the 
personnel on the other. 

A practical consequence of this distinction might be that even if the 
evaluation shows new methods to be effective when performed by the 
"best" teachers, it does not necessarily follow that the same methods 
should be adopted by all teachers in the program. In order to make 
such a decision, the program would have to be defined as being only 
the methods and materials; and the evaluation would have to 
demonstrate their effectiveness with a variety of teachers in a 
variety of settings. 

Determining how the students would perform without the program is a 
very troublesome question for a program impact evaluation. The data 
may show that students are meeting program objectives and that they 
score very well in. comparison to National and/or local norms. This, 

however, does not prove- thst the program is effective. Someone might 

\ • 
argue that the same students would do just as well or even better in a 

• regular non-biO^ngual classroom or in an ESL program. 
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The laboratory approach to answering this question would be to divide 
the students randomly into groups— one or more groups for each type of 
program—and then to compare the effects of the different " programs 
after sofme reasonable amount of time* In practice, however, because 
of the diversity of services and the-chBracteristics of bilingual 
students, this is almost never pos^ibla. the result is that the 
effect of a program cannot be separated from effects of other factors 
in a conclusive manner* ^S^^iil&i? 0 " usitl 9 dat ?' from *a\ single 
"academic year pr obaM*y - *'sh^ not even try to prove impact. However, 
data collected o/&r several years can probably be used to develop an 
argument that, (while not completely definitive, will be reasonably 
convincing as to the impact of the program. Bilingual programs should 
attempt to 'collect multi-year data on student performance* 

Problems Associated With Accurate Measurement 

> 

In addition to the issues described above* impact evaluations, as well 
as evaluations of studen f performance, are themselves impacted by the 
measurement techniques available to measure performance* The 
predominant factor is the ability of the evaluation design and the 
evaluator to control the "noise" or more commonly, the error of 
measurement* The characteristics of a bilingual program further 
complicate the problem. 

s 

An Analogy: The Siqnal-to-Noise Ratio * " < 

It is generally accepted that tes.t scores include some measurement 
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error, and that student performance is affected by many things outside 
of the program. *To use the popular term from the stereo recording 
industry, these various ki'nds of errors can be thought of as the 
"noise" in any test score. Tcpursue the analogy, think of the true 
changes in student performance (which may or may not represent impacts 
of the bilingual program) as the* "signal" in the test score, just as 
the music is the signal on a stereo tape or record. If bhere is a^lot 
of noise in the stereo system, very soft passages of music will«be 
lost in the hiss and static, although very loud passages may be*quite 
clear. In the same way, if there is a lot of noise in an evaluation, 
small charr'g^s^in student performance will be obscured*, even though 
dramatic changesNtfould show up quite clearly. 

The important issues for anyone involved in evaluation are (1) how 
much noise is there in a carefully done evaluation? and (2) can 
changes be expected in students (or impacts due to the program) that 
are big enough to stand out from the background of noise? 

To oversimplify a bit, the anower depends on both how well the 
-evaluation is done and on the evaluation questions that are asked. It 
is probably safe to say that in the vast majority of. program impact 
evaluations (for -all kinds of programs, not just bilingual programs), 
we are dealing mainly with noise. On the other hand, questions that 
ask only about student performance can usually be answered quite well. 
This issue of error in measurement is explained more fully in the 
section on data analysis. 
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The characteristics of bilingual programs which compound the 
measurement of error problems are: 

Programs Cover Several Grades — Most bilingual 
programs cover several grades, are often started at the 
lower grade levels and expanded upward, one grade per' . 
year. Therefore, -a K-6 program cannot be evaluated by 
simply observing one or two- of the lower grades, but 
require m u It i -y e ar evaluation designs. Multi-year 
evaluations present many methodological problems. In 
fact, student turnover makes .most program evaluations 
longitudinal in theory only. 

Program's Change From One Year tb the Next — Bilingual 
education is characterized by new and constantly 
evolving instructional approaches and £he programs are 
under great pressure to provide immediate evidence of 
positive results. However, there is simply no way to 
do a meaningful outcome evaluation cf a program that is 
only partially ^in place or is irr a state of flux* 

Different Students Get Different Instructions — 
Meaningful evaluation requires a clear understanding of 
what happens to each- student. Instruction .in bilingual 
programs often varies widely among students , even 
within a single classroom. When the instructional 
program is described clearly it becomes obvious that 
only a few students received any one treatment. This 
. creates difficulty, since the different groups may be 
too dissimilar to aggregate, but too small to analyze 
separately, * % • « 

Young Children are Difficult to Test ~ The testing of 
young' children, ^especially those below the third grade, 
is. notoriously difficult. Many bilingual programs, 
however, focus heavily on the lowest grades. There is 
no obvious answer to this problem, but it ghoul d be 
acknowledged prior to conducting an evaluation. 



Popular "Solutions 11 That Do -Not Work 



The frustrations generated by the kinds of problems described above 
have led to many misguided attempts to find solutions. Some fail to 
answer the impact question, but do answer other questions of possible 
interest. Others sre of no use at all. Approaches that should never 
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be used are: 



o Raw score post test minus raw score pretest for 
English language subjects . In lieu of 'any better 
ideas, many evaluators simply subtract raw score 
pretest scores from post test scores and compute 
the difference. Since almost all groups of 
children make some gains in English language 
subjects, even when they are falling rapidly 
behind their peers, this approach is of no value 
at all for these subjects., A popular variation, 
selecting a gain of some arbitrary number of 
raw-score points as the program target, is no 
improvement ♦ 

o Grade-equivalent scores (the month-for-month gain 
myth) . Analyses based on grade-equivalent scores 
still, unfortunately, appear all too frequently. 
They are based on the mistaken belief that a gain 
in test scores of one or more months for each 
month of instruction represents good progress. 
This is not true. Grade-equivalent scores provide 
an illusion of simplicity but, in fact, they are 
virtally impossible to interpret, even for 
specialists in test construction. 
Grade-equivalent scores should never be used for 
any purpose whatsoever. 

o IQ-based formulas . From time to time, an attempt 
To use TO score s appears as the basis for 
evaluating reading or math performance. ■ The idea 
that IQ tests provide an absolute standard against 
which to compare a specific skill is simply a 
misunderstanding. IQ-based formulas are not 
appropriate for use in bilingual program 
evaluations. 

o Subjective data . As a last resort, evaluators 
sometimes fall back on subjective data, usually 
teacher reports. Such reports are always useful in 
interpreting r.e suits and supplementing 
standardized scores. However, they can never be 
assumed to represent reliable, valid measures of 
student performance when used alone. 

In an effort to find appropriate solutions to these problems, 
evaluators have turned to practices which appear to solve these 
problems. However, some of these practices are often misused. 
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Approaches that are widely misused are: 



Criterion -referenced testing . Some evaluators 
suggest that criterion-referenced tests can solve 
'the major problems faced by evaluators. Actually, 
what the criterion-referenced test advocates have 
done A is to change the question that is being 
asked. Criterion-referenced tests can provide 
information as to whether program objectives have 
been met. However, measuring student performance 
or program impact still requires reliable, valid 
tests with an adequate range (no floor or ceiling 
effects). In principle, criterion-referenced 
tests could meet these requirements but, in 
practice , most do not . 

Gap-reduction models . "Gap-reduction" is a term 
that appears" in the bilingual program evaluation 
literature. It usually means either (a) students 
get closer to the national norms, or (b) students 
get closer to some dissimilar comparison group. 
The former is simply an application of the 
norm-referenced model, which is useful for 
student-performance evaluation 1 but generally not 
for program-impact evaluation. The latter is an 
example vpf non-random comparison groups (see 
below). The, important point is that 
"gap-reduction 11 is simply a new name for familiar 
designs. The new name does not change their 
strengths or weaknesses. 

Non-random comparison groups . Many bilingual 
program evaluations make use of non-random 
comparison groups, that is, different kinds of 
students who are receiving different instructional 
treatments. * As part of any evaluation of student 
performance, such comparisons maybe of great 
interest to local decision makers and program 
staff. In general, however, such comparisons do 
not by themselves provide program impact 
information because student differences are 
confounded with program differences. 




By this time, the program director may ask if there is really any use 
in conducting the evaluation. The answer is yes, provided that the 
program director and evaluator fully understand the problems. 
Secondly, for these reasons, the Handbook strongly recommends that 
evaluations of bilingual programs concentrate their efforts in 
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conducting evaluations" of student performance, rather than impact, 
when evaluating student outcomes. This, together with the evaluation 
(description) of program operations meets the Federal "requirements, as 
well as provides the program with sufficient information with which to 
make informed decisions on how to improve the program. 
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1, Developing the Evaluation Design 



The first steps in performing the evaluation of student outcomes is to 
determine tbe type of evaluation that will be conducted and what 
questions the evaluation is designed to answer. The type of 
evaluation conducted, however, must address the minimum Title VII 
requirements. 

Title VII requires that bilingual program evaluation include 
provisions for measuring the accomplishments" of the instructional 
objectives, the progress of the students in improving their English 
language skills and a procedure for using the information to improve 
the operation of the program. Meeting these requirements is 
relatively simple and can be accomplished by following the procedures 
recommended in the Handbook. In order to meet these requirements, the 
Handbook recommends conducting an evaluation of student performance, 
rather than attempting to determine program impact. This can be 
accomplished by using the basic evaluation design provided in this 
Handbook. 

The Basic Evaluation Design 

Because of the difficulty in conducting program impact evaluations, 
the recommended approach to evaluate student outcomes is simply to 
evaluate student performance. This approach is referred to in this 
Handbook as the basic evaluation on the basic evaluation design. This 
basic -evaluation design, therefore only answers the relative 



per forma nee question , "to what extent are the bilingual students 
achieving?" 

The basic design has minimal requirements. These are: 

o Testing only'the students enrolled in the 
bilingual program; * 

o using adequate norm-referenced tests (NRTs) 
capable of measuring English language skills, 
first (LI) language skills, if applicable, and 
academic subjects (e.g., math, science, etc.)} 
and 

o measuring performance for only one academic year. 

Applying these minimal design requirements to the first student 
outcome component, English language performance, is all that is 
reqired to meet the Federal evaluation requirements. However, most 
bilingual programs should at least evaluate performance in two other 
outcome areas, first (LI) language and academic subjects. 
Additionally, although the basic design does not require a multi-year 
evaluation design, the Handbook does recommend that bilingual programs 
attempt to collect multi-year performance data. At a minimum, 
programs should strive to collect data over the duration of their 
grant period. It is conceivable that data showing progress over the 
life of the program, can be used to argue that the bilingual program 
was responsible for the outcome. 

Expanding the Evaluation 

Programs wishing to extend the evaluation beyond a description of 
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student performance to measure program effectiveness and/or impact 
will need to- enhance the requirements of the basic design. At a 
minimum, these evaluation designs may require three modifications. 
They will have to obtain test scores for comparison purposes from 
students enrolled in other bilingual or non-bilingual programs. In 
practice, this option may only be realistic for programs located in 
school districts that employ district-wide testing programs, where 
scores for all district students are readily available through 
computer services or some other easy-to-use form, or if a comparable 
group of students can be identified and tested. 

Single-year evaluations only serve the purpose of the basic evaluation 
design and can only document if the program is effective compared to 
baseline data, but cannot show year-to-year changes. Therefore, 
evaluations attempting to measure effectiveness will most likely 
require multi-year evaluation designs capable of tracking students 
throughout their participation in the program. Multi-year evaluations 
require the use of the same measurement instruments throughout the 
evaluation period and 3trict recordkeeping. 

Evaluations attempting to measure effectiveness will most likely also 
need to expand their measurement instruments beyond norm-referenced 
tests. These may include criterion-referenced tests (CRTs) , mastery 
tests, and other types of measures. Some programs administer these 
tests as part of their instructional program. The costs to include 
results from these tests in the evaluation could be minimal and very 
productive. 

22? 
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How to Select Among the Options Ideally, if you want' the most 

complete picture of your program, you should include all of the 
options. This Handbook certainly recommends that you incorporate any 
options that can be added at little cost in money and effort. Beyond 
that, you must decide on the basis of tradeoffs between the amount of 
effort involved and the importance of the additional evaluation 
questions that can be answered by adding the different options. The 
levels of effort and additional resources required for adding the 
options depend very much on local factors such as the ones described 
below. 



Use of local comparison groups . Identifying and 
testing local comparison groups can easily double 
the level of effort of your evaluation. On the 
other hand, if your district has a district-wide, 
testing program with computerized results, 
comparison group data may be available to you at 
little or no cost and minimal effort. 

Use of CRTs (and other tests) . Many programs 
administer diagnostic or mastery tests as part of 
the instructional program. It may be easy to 
include results from these tests-in the 
evaluation. At the opposite extreme, some 
programs make elaborate attempts to develop tests 
to measure local objectives. Such an effort may 
be useful for monitoring instruction, but is 
probably not justified for purposes of evaluating 
student outcomes . 

U sing longitudinal evaluation . The main 
requirements for longitudinal evaluation are 
continuity of personnel, proper planning, and 
careful recordkeeping. This option is essential 
if you are really interested in monitoring the 
progress of your students. Single-year 
evaluations serve little purpose beyond meeting 
funding-agency requirements. 

Using and developing baseline data . Baseline data 
are obtained by testing bilingual students before 
the program starts. If these data do^t already 
exist (e.g., from a district- wide evaluation 
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program) it cannot be reconstructed. Before 
considering baseline input, make sure your 
district maintains all of the required information 
in a form that a new bilingual program cat) use. 



Additional Evaluation Questions Can Be Answered In the introduction 
to this chapter, we discussed three kinds of student outcome 
evaluation questions: (a) absolute student performance, (b) relative 
student performance, and (c) program impact. The basic evaluation 
consists of administering a norm-referenced test to the program 
students. This design lets you answer the relative performance 
question "How do students compare to a National norm group?" The 
options, described above, that you add to the basic design will 
determine which additional questions you can answer. These may 
include: 
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Absolute student performance questions . In 
general, these questions require the addition of 
appropriate tests .such as CRTs, mastery, etc. 

Other relative student performance questions . The 
different options enable you to compare your 
student to various %ther groups such* as (a) other 
(dissimilar) students in the district (from 
comparison groups), (b) previous program students 
(from longitudinal designs), and (c) pre-program 
students (from baseline data). 

P roqraro-impact questions . Each piece of 
student-performance information will provide some 
clue for possible program impacts. However, 
strong evidence- would have to include both (a) 
evidence that students had improved as compared to 
baseline data , and (b) that other students in the 
district had not made a similar improvement (local 
comparison groups). You will also need evidence 
that the characteristics of program students 
(entering language skills, SES, etc.) have not 
changed. Longitudinal data can show/impact if the 
program is improving each year. However, a 
program pould be very . ef f ecti ve as compared to 
baseline data, but show no changes from year to 
year. 
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Preparing for the Evaluation ' 

Because the evaluation resources* are limited, the evaluation may not 
be able to answer all questions. Priorities must be determined with 
respect to the most useful information to be obtained from an 
^evaluation. The evaluation does not have to provide data on each 
student's learning outcome. The evaluation may provide data only on 
the students as a group. For example, measurements may be made of 
changes in reading achievement of third graders but not on reading 
achievement of a specific student in that grade. The evaluation does 
not have to provide data on sub-skills such as phonetic analysis but 
rather on general skill levels such as reading achievement. 

Certain decisions must be made before any data is collected to ensure 
N that the analyses can be conducted as desired. Program goals need- to 
be organized according to several key student or program features such 
as: 

o Subject area (e.g., reading, writing, speaking)} 

o Language used in instruction (e.g., English, 
Spanish) ; 

o Student language proficiency category (e.g., 
English: limited or proficient, Spanish: limited 
or proficient) ; 

o Grade level of students; and 

o Year of the program. 
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Worksheet No. 16 allows the evaluator to organize students according 
to these categories in preparation for measuring student performance. 

How to Use Worksheet No. 16 The Evaluation Design Worksheet is to 
be used as a planning worksheet for developing the evaluation of each 
of the four areas of the student outcome component of the evaluation. 
The worksheet provides space for listing the different languages and 
subject areas to be evaluated and the tests to be used. 
Identification of comparison data .and evaluation questions to be 
answered for the four areas: English language skill, first language 
skills, student achievement, and affective areas may also be listed. 



This worksheet will aid the program director and/or evaluator in 
keeping track of the decisions to be made for each outcome area. The 
program director and/or evaluator will need a separate worksheet for 
each area? Thus, multiple copies of this worksheet will have to be 
made. In filling out Section I of the worksheet, list the subject 
area3 to be evaluated, the test to be used (name); apd the language in 
which the test is to be administered (e.g., reading, CTBS in Spanish). 
In the case of norm-referenced tests, list the form, level, and date 
of the testing. For other tests, such as criterion-referenced or 
teacher-made tpsts, provide a brief description of the skill(s) they 
are designed to assess* 

In i \Section .II, program and student description, list the grade levels 



which the subject areas are to be evaluated, the student's language 



skills, and any other descriptions such as students enrolled in a 
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special language laboratory. This information at the time of analysis 
will enable the evaluator to break out the students into separate 
groups by grade level, language skill, and possibly by any relevant 
program feature (e.g., students attending a special language 
'^laboratory ) . 

.• 

For Section III of the worksheet, identify the student groups that the 
bilingual program students will be compared to, the test to be used in 

these comparisons, and whether these comparisons involve current or 

b % ' 

. past year test scores. If norm-referenced tests are not to be used, 

there will be no norm-referenced comparisons to be made. However, 

scores from district developed mastery tests or criterion-referenced 

tests for similar or past students can be used to estimate' the 

progress of students currently enrolled in the bilingual program.- 

Section IV requires a description of the actual comparisons to be made 
in addressing each evaluation question. In the section on Student 
Performances, indicate the relative comparisons to be made. An 
example would be comparing scores of students in the bilingual program 
with student groups identified in Section III* The absolute standards 
of performance require identification of past or current similar 
student progress and which mastery or criterion-referenced test were 
used. 
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EVALUATION DESIGN WORKSHEET 



I . Suoiec: Area and Language : * 

7*s C$: MRT: _ 

Dtntr: 



1 1 . »-ooram Scudcnt Description : 
Graoe Level (s): _____ 



language Skills': English: 
Otner Descriptors: 



Othtr: 



Ml. Ccmoarison Oata (Greuos and Ytirs) 



Student Grouos 


Test "Code j Current Year 1 


Earl itr v ears 


A. 








8. 








C. 

9 









IV. Evaluation Questions 

Student Performance 

t. Relative Standards 
• of Performance: 



2. Absolute Standards 
of 9 erfomance: 
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2. Evaluating the English Language Component 

The English language skills to be evaluated are the fundamental 
components to language use. These include knowledge of the sound 
system for oral language and comprehension of the orthographical 
system for written language. While each of the four language skill 
.areas -- listening, speaking, reading, and writing can be 
'considered individually, one "component of language cannot easily be 
isolated from another. It simply cannot be assumed that mastery of 
one skill area necessarily indicates mastery of a related skill area; 
nor can it be assumed that lack of skill in one area indicates lack of 
skill in another. For this reason, the model recommends that 
proficiency in all four language skill areas be assessed. 

The identification of appropriate norm or criterion referenced 
instruments is essential to conduct this facet of the evaluation. 
Although numerous instruments exist, many are not comprehensive or 
organic in design. This means that the evaluator must carefully 
select instruments or components of instruments to meet the evaluation 
bbjectives. 

Measurement of oral language and listening comprehension can be 
performed by using informal measures. Informal. reading inventories or 
cloze tests may be used to determine the basic reading level of the 
f student. Informal written criterion-references measures may be- useful 
for assessing basic writing skills. ' The evaluation of the language 
component may be overlapped with the academic achievement component if 
norm-referenced measuresllre used to assess the literacy skill sreas. 

* 9 ' , * ' / " ,M28 
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Three Basic Design Decisions , • 

For practical purposes, most programs must make three basic evaluation 
decisions: . (a) which students to include, (b) what tests to use, and 
(c) what period of time to include. For each decision, the Handbook 
recommends a choice for a basic or minimal evaluation and then offer 
options that will let you answer additional questions if you have the 
necessary evaluation resources. 



W hich students to include ? The basic evaluation 
rT nr 0 o nnly testing the students en rolled in the 
bilingual program . An option could be to. obtain 
data from other students in the district for 
comparison purposes. Theoretically, the bilingual 
program staff could pick out comparison groups and 
test them. In practice, though, this option is 
realistic only where there is a district-wide 
tes ting program , and the scores for all district 
students are readily available on computers or in 
some other easy-to-use form. 



What te.sts to use? 



The basic evaluation requires 
a reliable T standarized, norm-referenced test 
(NRT) of reading .and other language skills. 
Usually, the test used for district-wide testing 
may be used. Options include criterion-referenced 
tests, teacher-made tests, mastery-tests, and 
tests included as part of commercial instructional 
packages. We will refer to these kinds of tests 
generically as "CRTs, etc." 

What p eriod of time to cover ? The basic 
IvTTuaTion requires covering only one academic 
year and testing only once in the Spring. Two 
options are highly desirable: (a) multi-year 
desi-ns following program students from one year 
to the next, and (b) baseline data on program^type 
students obtained before the program begins. A 
sub-issue is whether to test once or twice a year. 
The first choice should be to test only once a 
year in the Spring. Options are (a) once a year 
in the Fall or (b) twice a year, Fall and Spring. 
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— These basic choices can be summarized as follows: 

* Basic Evaluation Optional Additions 

1. Students Program only Comparison groups 

from the district 

2. Tests NRTs CRTs (etc.) 

3. Term of Evaluation Single year Multi-year - 

Baseline data 

(Time of Testing) Spring only Fall only 

Fall and Spring 

Applying the Basic Design to the English Langua ge Component 

The basic evaluation design through the use of a norm-referenced 
approach provides for comparing bilingual program students to a 
national sample of students who scored at the same pretest percentile 
on a nationally-normed test. For example, if the students in. the 
bilingual program scored at the 25th percentile on the pretest, their 
growth can be compared to the growth of the students in the norm group 
who scored at the same 25th percentile on the pretest. 

The norm-referenced approach makes the equipercentile assumption that 
a group of similar students who are not enrolled in the bilingual 
instructional program will maintain the same percentile rank 
throughout the year. This does not mean that the group without 
bilingual instruction is not learning. -It simply means that their 
learning rate keeps them at a similar position relative to other 
students in their grade. In contrast, the students in the bilingual 
program will hopefully learn faster than they would in the program. 

"" 3 °23« 



The question therefore being addressed is, "Do the students in the 
bilingual program increase their percentile ranking as compared to a 
national, norm group who began at the same percentile?" 

Key Comparisons to Be Made 

e 

There are many comparisons of performance that can be made. However, 
the five comparisons which follow are the ones that the evaluator mey 
find useful and can be performed without using complex statistics! 
procedures. 

1. Are the students in the bilingual program making 
gains? 

2. Is this year's student performance an improvement 
over past years? 

3. Are\ students meeting the objectives of the 
program? 

4. Are students doing better in the bilingual program 
than in another program? 

5. Are students doing better than they would be 
expected to do without the program? 

The answers to the first two comparisons can be easily answered by 
applying the basic design and using a norm-referenced test. The other 
comparisons require adding one or more of the options described 
earlier, such as a comparison group of students from anbther program. 

The first question, "Are the students in the bilingual program making 
gains?" can be answered by administering a norm-referenced test (NRT) 
of English language skills and comparing the bilingual student 
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posttest scores with those of the norm group provided by the NRT, 
Answering this question will provide sufficient information to meet 
the Federal requirements. 

c 

The second question, "Is this year's student performance an 

improvement over past years? 1 * can.be answered by comparing the gains, 

of the students in the program each year/ taking into account the 

error of measurement When making^this comparison, it is very 

important to realize th§t it 'may not be easy to determine why the 

change from one year to the next occurred. However, other data from 

% 

the evaluation (the program description and monitoring of program 
operations) could provide some clues for the observed change. 

x 

The third question is "Are students meeting the objectives of the 
program design?" This is both the most difficult and easiest question 
to answer. The difficulty comes in deciding what the goal level 
should be. To establish a realistic goal, the program staff and 
others need to carefully review, the present skill level of the 
students; th.e amount and type of instruction required to make a 
certain change in student achievement; the motivation of students, 
staff, and parents to implement the change; the accuracy of the 
assessment instrument; and other similar conditions. Based on this 
information, the desired performance level on a test or other 
assessment device can be established. 

The fourth question "Are students doing better in the bilingual 
program than in another program? 1 ' must be answered by first 
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identifying the other program to be compared to the bilingual program. 
For example, there may be an alternate program in the school designed 
to teach skills similar to those being taught in the bilingual ' 
program, but using a different teaching -method. Or, tfome' schools in a 
district may be using one method of instruction, and other schools a 
second method. A comparison of these programs may be of interest. In 
order to make such a comparison, the groups must be comparable, or a 
plan to statistically adjust the results must be developed. It is 
recommended that comparability of the two groups be established prior 
to any comparison, because statistical adjustments of dissimiliar 
groups require complicated and sophisticated analytic procedures, 

9 

which are not generally available. 

The final question, "Are students doing better than they would be 
expected to do without the program?" can be answered by the 
information from the first question. It is assumed that students are 
• enrolled in bilingual programs because they need instruction in both 
languages. Therefore, if they did not have access to these services, 
they would probably not learn as well. If the data show that they are 
achieving, then they are doing better. 

Many other questions that involve comparisons by race, past 
achievement level, social economic status, etc., are not addressed 
here because they would either be very cos'ly or very difficult to 
measure. Programs attempting to make other comparisons should 
approach the exercise with caution. 
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Selecting Appropriate Tests to Measure English Language Skills 

The criteria for selecting an achievement test to measure English 
language skills in a bilingual program are the same as those used in 
selecting a test for any evaluation. However, some criteria are more 
difficult to meet because few tests have been developed with the needs 
and characteristics of bilingual students in mind. Note also that a 
major assumption is made about the measurement of the English language 
component — that the students Learning Eptjlish language skills have 
enough English language facility so that testing can occur in English. 
If this is not true, the students are likely being instructed in their 
native language and they are acquiring language skills in that 
language. 

The basic evaluation design recommends the use of a standardized, 
norm-referenced test (NRT) of reading and other language skills to 
evaluate the English language component. % Most school districts now 
routinely administer one of these tests to all students. If the 
district does not use a norm-referenced test (NRT) and NRT scores are 
not readily available, the evaluator may choose to select one of the 
tests described in the Technical Appendix . .Jhese tests are 
reasonable, reliable, and valid. The main concern should be that the 
test content matches the program curriculum, at least on a general 
level. If this basic check is not made, it may later be discovered 
that the second-grade test covers third-grade curriculum, and vice 
versa. 
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In an evaluation using, a norm-referenced test, the norm grqup is used 
as the comparison group for the bilingual program students. 
Therefore, it is preferable for the norm group to be as similar as 
possible to the program students. Most available norms for a given 
grade, however, are designed to be representative of'the U.S. 
population as a whole. Some tests may have norms for different 
regions of the country or for special educational programs, such as 
ESEA "Title I programs. Norms established for students in Title I 
programs may be similar to the norms of students in the bilingual 
program, since their students may reflect similar socio-economic 
backgrounds. 

Finding a test with norms that are comparable to the bilingual 
students is unlikely, but. having an idea of the nature of the 
differences will help in interpreting the final results. In addition, 
the test should have norms that are as current as possible. If norms 
are over 5 or 10 years old, the students were probably experiencing a 
significantly different curriculum or instructional method than the 
bilingual students currently being tested. 

There are two major problems to consider in selecting NRTs. These 



are: 



Test level ( floor 'and ceiling effects ). In some 
bilingual programs, the at-grade-level test i s ,t°° 
difficult for program students at pretest. The 
next lower level may be too easy at posttest time. 
If the mean score on a test is less than 253 of 
the items correct or more than 75% of the items 
correct, floor or ceiling effects probably exist, 
and the test cannot give an accurate picture of 
either student performance or program impact (See 
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Out-of-Level or Functional Level Testing in the 
Technical Appendix ) . 

Multi-year and mult igrade-level requirements . Most 
bilingual programs cover several grade levels. 
Therefore, it is desirable to have achievement tests 
that can be used to compare progress across grades and 
that can be used to follow groups of students as they 
progress through the grades. In practice, this means 
using any one of the recognized achievement tests. 

Guidelines for Using Norm-Referenced Tests — The following guidelines 
for using norm-referenced tests (NRTs) should be adhered to in order 
to produce a valid^tvafluation. 

0 

1. Do not use the same test score to select students 
for the bilingual program as th.e pretest score. 
Doing so tends to over estimate the impact of the 
program. The pretest and* sel ection test scores 
can be separated by; 

o Administering separate tests; 

o* Using last year's posttest scores as this 

year's selection scores; 

w 

o Using different subtests of the same test 
battery -- one to select students and one as 
the pretest (both subtests, of course, need 
to be related to the objectives of the 
project ) ; and 

o R e a dm i n i s t e r ing the same test used <,for 
selection as the pretest. 

2. Tests should be commensurate with the development 
and skill level of the students. 

3. Use the same test form for pretesting and 
posttesting. (Test forms have the same difficulty, 
but contain different, although comparable 
items) • 

4. If a norm-referenced test is used, testing should 
occur within two weeks before or after the 
publ'sher actually administered the test to a 
national sample for norming purposes. .These 
empirical norm dates differ from projected norms 
- - norms which are merely estimates of 
performance. Testing done-at the same time as 
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that for the norm group provides more accurate 
comparisons. Deviations from the norm dates 
should be in the same direction and magnitude for 
both pretest and posttest. That is, if pretesting 
occurred a week before the norm date, the same 
should be true for the posttest. - • 

Using CRTs (etc.) for Evaluating the English Langu age Component — The 
choice of CRTs (etc.) is more of a curriculum decision than an 
evaluation decision in most districts. That is, when developing 
objectives ana curriculum materials for a bilingual program, many 
districts either develop or buy tests matched to their curriculum and 
the instructional materials. These tests are the best candidates to 
use in your evaluation. If you have important objectives for student 
performance that are not covered by any other tests, you may wish to 
develop or buy' special tests just for evaluating student outcomes*.' 

Cautions for CRT Users — If teachers keep good records of the number 
of students passing each test and the dates on which they pass, these 
records will provide a form of absolute s tudent performance measure, 
as well as a progress record over the course of the year. The records 
are interesting in their own right, and can also be compared from year 
to year. Often, however, such tests are weak in the characteristics 
required for outcome, evaluation (high reliability and validity, plus 
adequate floors and ceilings) so they should be viewed as ballpark 
measures that include a lot of noise (error), and they should be 
interpreted with great caution. In short, our recommendation is to 
look at the results from CRTs (etc.), but be careful. 
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3. Evaluating the Non-English Language Component 



Bilingual programs, for evaluation purposes, can be divided based on 
their non-English language component into three types. These are: 

o Spanish only programs; 

o Single languages other than Spanish programs; and 
o Multiple language programs. 

The major differences among these three types of programs, from the 
eyaluator's perspective, are: (a) only Spanish-English programs will 
find commercial tests readily available, and (b) multiples-language 
programs often include small groups that cannot be combined for 
evaluation purposes. • * 

Three Basic Design Decisions 

The three basic decisions made far the English , language component also 
apply to the non-English language component: (a) which students? (b) 
what tests? and (c) what time period? However, the decisions afe sven 
simpler for the non-Engl ish • language component, because there ar'tffc 
fewer alternatives available to the evalua.tor. The basic options can 
be summarized as follows: 

o Which students ? In general, only the bilingual 
program students, will speak the languages in 
question and therefore the' only students, that can 
be included in the evaluation. In a few 
districts, there may be comparison groups of *. 
interest from other programs or other districts 
who use the same tests. However, in most cases, 
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only your program students will be tested in the 
non-English language, making comparison groups 
unavailable. 

Which tests ? A limited number of standarized 
tests are available in Spanish (although their 
norm groups are not analogous to those from 
English-language tests, and you should not use the 
norms as a simple standard of comparison). For 
other languages, you are limited to, at best, a 
few, commercial, criterion-referenced tests, plus 
locally-made tests (CRTs, etc.). 

What period of time ? Here, the evaluator has the 
option of single-year or multi-year designs since 
baseline data before the start of a new bilingual 
program could be collected. However, in practice, 
few districts will do this. In general, if the 
English language evaluation is multi-year, the 
non-English language evaluation should also be 
multi-year. Otherwise,- both should be single-year 
evaluations. 



The decision on once-a-year (Spring) verus twice-a-year (Fall, Spring) 
testing will probably also be the same for non-English testing as for 
the English language testing. 

The basic choices are summarized below. 



1. Students 

2. Tests 

3. Term of Evaluation 
(Time of Testing) 



Basic Evaluation 
Program only 



Optional Additions 

None from 
the district 



CRTs, etc. 

(NRTs for Spanish) 



None 



Single year 
Spring only 



Multi-year 

Fall only 

Fall and Spring 



How to Select Among the Options As you can see, the only real option 
is whether to include the non-English language component in the 
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evaluation at all* If you want to know how your students are doing in 
this area, you will almost certainly be. able to produce teacher-made 
tests that will serve your purposes, but you need to consider exactly 
which questions you can answer with such tests. 

Type of Performance Jhat Can Be Measured 

At first glance, it might appear that the evaluation is only able to' 
answer absolute student performance questions for the non-English 
language component. However, there is one key di fference- between 
English and non-English language performance that lets the evaluator 
consider program impact questions as well. It is a fact that most 
students improve their English whether or not they are in a -bilingual 
progam. Therefore, the burden of proof in program impact evaluations 
falls on the evaluator ' to. show that the students do better in the 
bilingual program than they would have done without it. In evaluating 
the non-English language component, however, the evaluator is probably 
safer in assuming students would learn little, or no reading or 
writing should occur, without the bilingual program. Therefore, the 
evaluator may be sble to argue'that the program is largely responsible 
for any level of performance they achieve.' With this in mind, the 
options and the questions that can be answered for the non-English 
language component are depicted below. • 
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Type Multi-year 

of Test Single-Year Evaluation Evaluation 

Test Program CRT Absolute Performance 

Students (etc.) Mastery of lesson 



Relative Performance^ Relative Performance 
Compared to no ' Improvement 



Only content 

ative Pe 
Compare 

program over time 

Key Comparisons to be Made ^ 



■ . 0 ■ 

The key comparisons that can be made relative to non-English or first 

r 

(LI) language skill development/performance can be the same as those 
made for the English language ^mponent. Performance measurement 
agains norms will only be possible for Spanish language performance. 
Therefore, answering the first comparison question for other languages 
will have to be made by using locally developed mastery tests. 

Answering the other questions may be done by following the same 
procedures as before! Answering the fourth question, which requires a 

« 

comparison group, should not even be attempted. 




Selecting Tests forSfche Non-English Language Component 



Selecting tests for this component is difficult because there are very 
few tests available. Spanish versions are available for the 
Inter-American Tests, the CTBS, and the ETS Circus test. However, 
conventional non-English language norms do not exist. The 
Inter-American Tests (Spanish) provide user-norms based on students in 
bilingual programs using that test. The norms provided with the 
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Spanish CTBS do not represent the population of Spanish/English 
bilingual students. Norms for both tests can only provide comparison 
standards for student performance evaluation, and these comparisons 
are difficult to interpret. So far as the review of literature 
indicated, no large-scale norm groups have b^en* tested in any^other 
languages. 

Commercial Tests for Languages Other Than English — 'The first two 
Spanish language tests mentioned above, while not cq^/enft ional 
norm-referenced tests, are similar in terms "of reliability and 
validity to other standardized tests. The El Circo (Spanish) * test for 
primary students also represents a high degree of development. All 
thtee can be used for • measuring student performance and comparing 
program students from year to year. Standardized tests to measure 
language achievement, particularly in the" first language, may be 
difficult to find. In this situation, it )*ould be appropriate to 
utilize criterion-referenced measures or teacher-made tests. 

Must English and Non-English Language Tests Come From the Same 
Publisher ? -- This questign applies mainly to Spanish-EngUsh 
programs, since few tests ai\e available in other languages. While 
there are some advantages to dealing with a single test publisher, it 
is more important to get the most appropriate s tests in each language. 
Limiting choices to tests that dre published in two languages is an 
unnecessary restriction. 



Teacher-Made Tests -- For languages other than Spanish, many projects 
will have to depend on teacher-made tests. Thsse tests should be 
quite adequate for demonstrating that students are gaining skills in 
their non-English languages. In general, they will not be adequate 
for measuring small, year-to-year changes in program effectiveness. 
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4. Evaluating Student Performance in Academic. Areas 

Evaluation of performance in academic areas requires the specification 
of the skills to be assessed, selection of the language in which 
skills are to be measured, and the identification of appropriate tests- 
-in English aod/or the first language of the student. The evaluator 
will need to determine which skill areas are to be included in the 
evaluation. Measurement of achievement in literacy as well as in 
major academic subject areas may be appropriate. This determination 
will have to be made on a p rogram-by-program basis. If a student is 
not literate in LI or L2, then achievement testing will not be 
appropriate. If the students are literate, the language in which to 
test the students will depend upon the language in Which instruction 
in the particular subject has been given, as. well as the fluency of 
the student in that language. 

The Basic Design 

Many bilingual programs include non-language, academic subjects, such 
as math, social studies, and science. The same principles that apply 
to the English language component apply to this component if testing 
is done in English. A minimal evaluation would consist of (a) testing 
program students only, (b) using standardized, norm-referenced tests, 
and (c) a single-year design. Options include local comparison 
groups, longitudinal designs, and baseline data. 

Or , 
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language of Testing 

TKe major issue in evaluating performance in academic subject areas is 
whether to test in English or in the first (LI) language. The 
evaluation will be easier to implement and the results easier to 
interpret if the testing is done in English. However, as a matter of 
common sense", if the students are weak in English and much stronger in 
their native language (e.g., new arrivals or young children from 
non-speaking homes), then testing in the native language may be 
required. In such cases, the evaluation design principles for 
non-English language components apply (see above).. 

Selecting NRTs 

By and large, the discussion of tests To* English language also 
applies to tests for academic subjects tested in English. The 
discussion of non-English language tests applies to tests of math, 
science, etc. in .^-English languages. The basic rule here, as it 
was for English language, is to utilize the test that is used 
throughout yo\.r district. The^echnical Appendix contains a 
discussion on the selection of achievement tests, as well as a listing 
of these tests for testing language, mathematics, science, etc. 

Using CRTs (etc.) 

, As in language testing, if you have test data available from your 
instructional program on math, science, or other subjects, you may 
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want to include these data in your bilingual-program evaluation. For 
subjects tested in languages other than English or Spanish, you may 
have to depend on teacher-made tests, and the normal cautions apply. 
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5. Evaluating Affective Areas of Student Performance 

Affectiw goals, like improving student attitudes or behaviors, are 
mentioned in connection with many bilingual programs. If your 
program has specific objectives in these areas and if the program 
includes specific components that are intended to change student 
attitudes or behaviors, then you should consider evaluating the 
effects of these components. However, you should be aware of two 
problems, which are discussed below. 

Affective goals must be clearly defined . In many bilingual programs, 
the non-acadmic goals are defined in very general terms, such as 
"improving self-concept." The test chosen to evaluate changes in 
self-concept may be some readily available commercial attitude test 
that bears very little relationship to the self-concept of the program 
students. The esults are almost certain to be meaningless. 



If you wish to evaluate affective components of your program, then you 
must define the goals clearly, describe the components of the program 
that are intended to address the goals, and then identify appropriate 
measures, such as tests, attendance records, and so on, that match 
your goals. Then you can begin to consider an evaluation design to 
evaluate absolute student performance, relative student performance, 
and program impact in the areas that you have designed. 
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Affective goals are very difficult to evaluate . While the general 
evaluation design principles* apply theoretically, in practice it is 
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very difficult and frustrating too evaluate changes in attitudes, 
self-concept, and so on. This is because (a) there is a great deal of 
noise in the measurement, (b) most measures are insensitive to change 
in attitudes, (c) attitudes change greatly from month to month and 
even from hour tp hour, (d) there are few good absolute criteria 
available, and (e) there are seldom any very good comparison groups 
available • 

The net result is that few evaluations can provide conviraing evidence 
of changes in attitudes or related characteristics of the students. 
For this reason, we would not advise bilingual programs to invest much 
of their effort in evaluating these goals unless they are a major 
focus of the program. 

Programs wishing to measure affective areas may consult the Technical 
Appendix . This volume contains a discussion of self-concept and a 
listing of different tests available. 
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6. Conducting the Data Collection Activity 



Data collection for the first component of the evaluation, program 
operations, consists of obtaining student background information, 
interviewing teachers, program administrators, and parents, as well as 
observing classroom operations. Data colleqted for evaluating student 
outcomes consist of test administration, scoring, and the recording of 
test scores. The latter activity probably requires a higher level of 
effort than the former. However, data collection, for the student 
outcome component requires strict discipline and very precise 
procedures. 



Testing the Students 

Testing in the academic program areas — language, math, science, and 
so on all require the same basic procedures. The main distinction 
that the evaluator should make is between formal testing for 
evaluating student - outcomes and informal testing for diagnostic or 
other instructional purposes, and out-of-level or functional level 
testing. Each type of testing is discussed separately below. 

Formal Testing for Outcome Evaluation — Standardized, norm-referenced 
tests should always be administered and scored under carefully 
controlled conditions. If you are serious about using CRTs, 
teacher-made tests, or any other kinds of tests for purposes of 
outcome evaluation, the same rules apply. Most of these rules are 
familiar to all teachers. Two points deserve special mention. Por 
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experienced testers using a familiar test, it is sufficient to bring 
the group together briefly within a 'ew days of the beginning of 
testing to review the tests and testing procedures. For new tests or 
inexperienced testers, each tester should practice administering the 
entire test under the supervision of the evaluator. 

Testing should be done within a few days of the same date each year. 
For norm-referenced evaluations, the testing should be within a week 
or two of the time that normative data were collected by the test 
publisher (or local district). 

Informal Testing Tor Instructional Purposes — In previous sections, 
the Handbook suggests including the results from CRTs, teacher-made 
tests, and so on in the outcome evaluation. In some areas, such as 
non-English subjects, these may be the only test results that you 
have. The problem is that many of these tests are given under 
informal classroom conditions. For example, progress checks or mastery 
tests are often taken by individual students while the teacher works 
with other students in the same classroom. 

The simple fact is that when you give tests under informal conditions, 
you can expect a lot more noise (error) in the scores than if the same 
tests were given under carefully controlled conditions. In general, 
you will have to choose, at least to some- extent, between 
instructional and evaluation uses for your tests. Tests that are 
given informally in the classroom will have provide only very rough 
measures of student outcomes. 
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Out-of-Level on Functional Level Testing - Achievement tests provide 
useful information for evaluating student performance. The value of 
such information is obviously related to its accuracy. Achievement 
tests are designed to accurately measure the achievement level of 
average students in a certain grade level. However, they may not 
accurately assess the achievement level of all students at that grade 



level . 



A student's functional level, at test time, may be below a test 
publisher's recommended test level. This is often suggested by a very 
low test score on a recommended test level and may indicate that 
guessing (chance) by the student played an important role in the 
result. Therefore, students whose scores are primarily a result of 
guessing on a test that is too difficult may need to be tested out of 
level . That is, they need to be tested with an easier, lower level of 
the test. 

Functional-level testing, therefore, involves testing students with 
test levels. most appropriate to their achievement levels. 
Functional-level testing can involve testing students with the 
recommended test level (in-level testing), or it can mean testing 
students with a test level below or "above the recommended level 
(out-of-testing). Whatever the case, the goal is to test at a level 
affording the students the most opportunity to demonstrate their 
abilities. The T.nhnical Appendix contains a more detailed 
explanation of when to use out-of-level testing, as well as how to 

properly conduct the testing. 2^7 
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Testing Procedures 

Testing procedures simply require following the exact instructions of 
the test and making sure that pre- and posttesting conditions and 
procedures are identical * Scoring and recording test data are subject 
to clerical errors. These errors, however, can be easily held/to an' 
acceptable level through adequate care and accuracy checks. Scoring 
procedures which require the scorer to make qualitative judgments 
about the adequacy of a response are more difficult to control/ These 
qualitative judgments may involve more than simply deciding whether a 
response is correct or incorrect. 

i 

The following guidelines should be followed during test 
administration. 



* 
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Guidelines for Administering the Testing 



AssembJing the Students 

o Utilize similar testing conditions for all 
treatment and comparison groups. Consider the 
time, place, and date of test administration. 
Follow technical manuals testing administration 
often contain testing procedure recommendations 
(e.g., avoid afternoon testing, or testing on 
Monday and Friday). 

o Distractions should be minimized. Avoid testing 
in the hall or in the cafeteria when lunch is 
being prepared. 

o Coordinate testing efforts with district testing 
or assessment policies and procedures. 

o Consider teaching test-taking skills to students. 
This includes acquainting students wijth test 
formats, etc. (NOT teaching to the actual test). 

o Plan for make-up testing. 



Administering the Test 

o Identify testers. If teachers do not speak the 
appropriate language, identify alternative 
testers. 

o. Conduct in-service training for all test 
administrators. If aides and parent will be used 
for testing, more intensive training will be 
•required for them. The items on the list below 
should be addressed: 

Familiarity with materials 

Clarity of presentation 

Adherence to guidelines and time limits 

Control in the classroom 

Attention to physical conditions (e.g., seat 
spacing) 

Practice for individual testing 
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Correct choice of testing dates (e.g., 
norming dates) 

The need for the inevitable "fill-in" of 
absentees f 

Clearly define roles' and responsibilities of 
testers. Inser.vice training and determination of 
roles and responsibilities should be assertively 
coordinated by the program director. 

3*. Scoring the Test ^ 
o Train test* scorers . 

o Scored tests should be spot-checked by someone 
other than the person scoring the test. 

o Check interrater reliability. 
4. Scheduling 

o Testing should be spread out over one or more days 
so that the burden on. the students does not so 
great as to lower scores. Pre- and posttesting 
must follow similar schedules. 

Scoring of Test Data One of the issues in scoring tests and 
recording the scores is whether to use computers. If the program is 
very large, the answer should probably be "yes," at least for 
norm-referenced tests. Many programs have access to district, 
university or state computer centers that can perform the scoring of 
the tests. If these services are not available locally, the test 
publishers or other scoring services can provide them. Hancj scoring 
arid recording may still have to be performed for very small programs. 
In addition, if non-standardized tests are used, it may be necessary 
to score the tests by hand before entering the scores into a computer 
for analysis. 
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A second issue involves the way that scores are organized for 
recordkeeping purposes. Since a student may stay in a program for 
several years, be tested several times and have several teachers, it 
w'ill usuaily be necessary to keep individual student record files. 
However, for -analysis purposes, it is desirable to group students by 
classroom. This 1 will require keeping a second set of forms. This 
should not /be- a problem if the data are stored in a computer, since 
the computer can do the work of regrouping the records of the 
students. Commercial scoring services may be able to do this type of 
processing. Some commercial scoring services can provide complete 
analyses of the data, including comparisons across years upon 
request . 

The type of score utilized is very important. Never use grade 
equivalent scores for any purpose. Use normalized standard sco res 
(preferably NCEs) for all computations and calculations of impacts. 
Report pre- and posttest performance in percentiles. The use of NCEs 
is explained in the section entitled "Analyzing Student Outcome Data",^ 
which follows this section on testing. 

Recording Test Data -- Recording the scores is the final step in the 
data collection process. To ensure that the scores will be usable, 
the details of recording should be planned well before pretest time. 
Where a commercial scoring service is used, the evaluator may have 
little control over the recording process, but if the program elects 
to do its own scoring or wishes to transfer scores from computer 
printouts to a more convenient form, the evaluator must consider two 



1 1-155 

ERIC 261 



important issues: (a) the accuracy of the data", and (b) the details 
of the data recording forms. 

Copying scores accurately onto data forms is not a complicated 
problem. However, even the most conscientious recorders make errors. 
Therefore, all data forms should be carefully proofread, preferably 
with one person reading aloud while a second person checks the scores. 
Attention must also be given to data recording forms. Data forms 
might appear to be of little importance, but the way in which data 
have been recorded in many school districts^ virtually precludes any 
reasonable analyses. It is not possible to prescribe a standard data 
format because school requirements vary so widely, but it is possible 
to state two general principles which must be observed. Firs^t, data 
forms scores must be able to completely identify all^cores, and 
second, data forms must be able to arrange data in a way that 
facilitates analysis. 



Recording Data for Multi-year Evaluations A data recording form 
that works well for a single fall-to-spring evaluation may not be 
suitable for following student progress over several years. Thus, 
data recording forms that allow for attrition, regrouping of classes 
each year and the total number of scores must be developed and use'd 
for a multi-year evaluation. 



The following guidelines should be used for recording jfest data: 



Guidelines for Recording Test Data 



Most sets of scores will require more than one 
page. The page should have a number identifying 
each page and the^"number of pages" to ensure that 
no pages used f oiO^Hecting data will be missed. 

Every form containing important information should 

have a name and date to indicate who filled in the 

numbers in case any questions arise in the future 
about the accuracy of the information. 

Each group for which data are. recorded should be 
clearlp identi f ie-tL at_the top of the data form to 
simplify the retrieval or that group's data" from a 
large data base. . 

Each page of forms containing student data should 
be arranged so that it can be photocopied without 
the students' names. 'This permits wide use of the 
data for research purposes without compromising 
student privacy. 

The analysis of da-a is simplified if only one set 
of test scores (pre and post) are recorded on each 
sheet. The rules for listing students ( see points 
6-11 below) should be followed. The complete name 
of the pretest and posttest (taken exactly from 
the test booklets and including publication date) 
must be listed. 

Identifying and /organiz-irvg, student names 
efficiently are the mos.t< difficult reco.rding 
problems. Single year evaluations collecting data 
through fall and spring testing should have *. 
minimal problems. However, multi-year evaluations 
that follow students over several years are a more 
difficult task since students come and go from 
projects, and groups are reorganized every year. 
The simplest rule is to make sure that the 
posttest scores are all entered on the same form 
•as*the corresponding pretest scores. This at 
least eliminates 'the problem of the e-valuator 
trying to find each student's name on two forms. 

A second rule for listing student names is to 

establish a standard for listing of names, use it 

for the life of the evaluation, and for all tests 

that are used. If a student moves or fails to 

take some of the, tests, then the appropriate , 

entries are left blank, but the student's name 1 

should not be eliminated, from the list. If new 
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students enter the program, their names should be 
added to the end of the list for all, tests, even 
those for which no data will be entered. If there 
is a compelling reason to change the order of 
student names in the middle of a project, then 
either all forms should be changed, or a double 
set of^ffcrms (old and new order) should be 
maintained. # 

A rule should be established for recording names. 
The simplest procedure is to allow plenty of space 
and to spell out first names and middle initials 
(e.g. Caldwell, Daniel E.). 

Each student should have an individual ID number 
that identifies the student. For example use a 
one-digit number to identify an experimental 
condition, a two-digit number to identify a group 
or class, a one-digit sex code, and a two-digit 
student number. In some evaluations, other codes 
(including letters) can be-used, but careful 
consideration of the situation is necessary in 
order to permit any desired grouping simply by ID 
number. 



A page on any form should have some^reasonable 
number of entries, probably 20 or 25. The same 
number of entries per page will facilitate the 



Test dates are critical, especially in 
norm-referenced evaluations. If all students 
listed on a form have their pretests in one day 
'and all are later posttested in a single day, then 
test date' information -is not really necessary. 
However, this is usually impossible M predict at 
the time the form is made up, so *e columns 
should be* made to provide space to indicate the 
dates of make-up tests and late entries into the 
program'. 

Pre- and posttest scores should, in general, be in 
adjacent columns, rather than pairing each pretest 
raw score with its standard score, percentile 
score, etc., followed by each posttest score and 
its transformations. 



analysis of the data. 
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7. Analyzing Student Outcome Data 



The analysis of the studervt outcome data should be performed or at 
least supervised by a trained evaluator. The analysis of student 
performance data should simply answer the questions which the 
evaluation was ^designed to answer and make the necessary comparisons 
that were established during the evaluation design phase. There are 
three steps in this' approach: 

o Examine scores for serious mistakes or unusual 
results . The scores can be examined simply by 
drawing the frequency distribution^ of test 
scores. If two sets of scores are being compared 
for the same stu'de'nts (for e,xample ? second-grade 
and third-grade scores) then scatter diagrams of 
one test against the other should be used. 

o Compute the mean scores and atpndard deviations 
for program (and comparison )j students . Ifthe 
scores do not appear to refllect any serious 

problems or unusual program effects, then simply 

compute the mean score for each group of program ^ , 
students (and for each group of comparison ^ 
students, if any). The standard deviation (a 
measure of how spread out the scores are) must 
also be calculated and reported for each group. 
The mean scores are used to draw comparisons or 
look for progress of the students. 

o Estimate the possible effect of error on your 
results . What may appear to be changes in student 
performance may only be random changes in the 
scores due' to noise (error). Errors, in mea 
scores of 5 to 10 NCEs are not uncommon, 
especially with small groups of students. 

In examining the data from the evaluation the evaluator should check 
to see if the distribution scores resemble a normal curve (bell 
shaped). If the distribution of scores is a different shape, this 
could indicate possible problems with the tests, testing procedures, 
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the scoring procedures or the data computer programs. An abnormal 
distribution in the data may also be attributable to the effects of 
the program on specific students. For example, in one bilingual 
program, the mean score s could show second grade students making a 
moderate percentile or normal curve equivalent (NCE) gain in reading. 
However, when individual students scores are analyzed, it may be found 
that only a few students in that grade have made very large gains 
while the rest of the students have made little or no change in their 
percentile standings. This information is useful to the evaluator in 
concluding that the program is working for some students but not for 
others. Using this finding, the program director may be able adjust 
the program, for those students not showing improvement in reading. 



Another problem' in analyzing the data from the evaluation is the kinds 

of noise (error) that remain in even the best evaluation data. 

Consideration should, be taken to ensure that change in students test 

scores are not due to noise but too the effects of the programs. 

Error in mean scores of 5-10 NCEs are not uncommon, especially for 

programs with small numbers of, students. Tests of statistical 

significance provide the best way of estimating the likelihood that 

the results are simply examples of random error. However, tests of 

statistical significance do not provide -information about the 

educational importance of results, since small gains can be 

statistically significant for large groups of students, while what 

appear to be 1 -ge gains can be due toq -random error with small groups 

of students. Tests of statistical significance also will not indicate 

» 

flaX^s in your evaluation procedures. Thus, individuals responsible 
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for conducting the evaluation should look for possible problems in the 
evaluation procedures. In order to better understand this issue the 
following information is' presented. 

An Analogy: The Siqnal-to-Noise Ratio 

It is generally accepted that test, scores include some measurement 
error, and that student performance is affected by many things outside 
of the program. To use the popular term from the stereo recording 
industry, these various kinds of errors can be thought of as the. 
"noise" in any test score. To pursue the analogy, think of the true 
changes in student performance (which may or may not represent impacts 
of the bilingual program) as .the "signal" in the test score, just as 
the music is the signal on a stereo tape or record. If there is a lot 
of noise in the stereo system, very soft passages of music will be 
lost in the hiss and static, although very loud passages may be quite 
clear. In the same way, if there is a lot of noise in an evaluation, 
small changes in student performance will be obscured, even though 
dramatic changes would show up quite clearly. 

Can the Signal Be Separated From the Noise in an Evaluation? 

The important issues for anyone involved in evaluation are (1) how 
much noise is there in a carefully done evaluation? and (2) can 
changes be expected in students (or impacts due to the program) that 
are big enough to stand out from the background of noise? 
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To oversimplify the answer depends on both how well the evaluation is 
done and on the evaluation questions being addressed. It is probably 
safe to say that in the vast majority of program impact evaluations 
(for all. kinds of programs, not just bilingual programs), the amount 
of noise witM be significant. On the other hand, questions that ask 
only about 'student performance can usually be answered quite well. 
Finally, even program impact questions can be answered in some 
districts where conditions and resources permit. Before getting more 
specific, however, we must pick a type of test score or "unit of 
measurement" that we can use to discuss the size of effects and the 
amounts of noise in program evaluation must be selected. 

Selecting a Unit of Measurement; The Normal Curve Equivalent (NCE) 

The type of test score that we will use is called the Normal Curve 
Equivalent, or NCE. Like any type of score that we might pick, the 
NCE has both good and bad features. Perhaps the worst is that it is 
unfamiliir to many educators. On the positive side, however, NCEs 
have many technical properties that make them useful devaluations. 
Th-ey have been adopted by many evaluators in the last few years, and 
many standardized test manuals now include tables for converting to 
NCEs. 

Basically, NCEs are one of the many varieties of normalized standard 
scores (others include stanines and T-scaled scores). Like all 
standard scores, they are generated by the test publishers from norm 

is 
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group data, so they relate student performance to a nationally 
representative group of students. 

» « 

Comparing NCEs and Percentiles — The NCE scale runs from 1 to 99 like 
the percentile scale (see Figure 1). In fact, an NCE of 1 is 
equivalent to the 1st percentile of the national norms, and an NCE of 
99 is equivalent to the 99th percentile. Similarly, an NCE of 50 
represents the mean of the national norm group, just as a percentile 
of 50 does. However, there are important differences. According to a 
popular model of student skills, each percentile unit at the end of 
the scale represents a large increment of skill, while a percentile in 
the middle of the scale represents a small increment of skill. For 
example, a student who wants to raise his or her score from the first 
to the second percentile (or from the 98th to the 99th percentile) 
must learn about 15 times as much as the student who goes from the 
49th to 50th percentile. 



This means that the number of percentile points that a student or a 
class improves does not tell us much unless we also know the starting 
point. NCEs, on the other hand, 'cover the same range but divide the 
range into \99 equal units in terms of skills. Thus}- if we say that a 
student gains one NCE , we can assume that it always means the same 
thing regardless of where the student started on the scale. 

Measuring Gains in NCEs — One last point about NCEs is important 
here. This is the difference between raw score gains (i.e., 
improvements in the number of items answered correctly) and NCE gains 
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Figure 1. Area under the normal curve divided into NCEs, 
percentiles, stanines, and standard deviation units. 
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(improvement in relation to the national norm group). If we had a 
test that covered several grade levels and we gave it to program 
students each spring, we would expect their raw scores to go up each 
year. However, we would not necessarily expect their NCE scores to go 
up. For example, let's say a student is exactly at the mean of the 
national norm group for his or her grade level (NCE = 50,). The next 
year, our student's raw score will almost certainly go up, but so will 
the scores of all the other students of the same age. All things 
being equal, we would expect our student to stay at the mean of the 
norm group, so the NCE score would still be 50. Discounting any error 
in the score, any change from an NCE of 50 would indicate that our 
student was learning faster (or slower) than the average student in 
the norm group. This could be due to an unusually effective school 
program or to ways in which our student (or community) differs from 
those in the national norm gToup. 

In Practice. How Big is an NCE ? -- The NCE is, therefore, a useful 
measure for evaluators, but what does it mean in terms of, let's say, 
reading skill? A few examples may give you some ideas. Suppose you 
comparedL,two second graders - one who reads at the average level for 
second graders and the other (a very good reader) who reads at the 
average third gr.ade level. The one who reads at the second grade 
level would get an NCE score of about 50. The better reader would get 
an NCE score of about 70 or 80 (it is possible to figure this out by 
studying the norms tables from standardized reading tests). In other 
words, a difference of roughly 20 to 30 NCEs represents the difference 
in skill between an average second grade reader and an average thiVd 



grade reader. 
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By the time students r^ach junior high school, the average student has 
developed his or her basic reading skills considerably, and the 
difference from year to year is not so great as it was at the second 
grade. At the junior high school level, this difference works cut to 
roughly 10 NCEs. 

As another example, think of comparing good and poor readers at a 
single grade level. Poor readers in special programs, such as Title 
I, often average around the 20th percentile. This corresponds to a 
NCE score of 32. An 80th percentile reader (NCE = 68) would be a 
fairly good one. In round numbers, then, a rather poor reader must 
improve about 20 NCEs to become an average reader. A gain of 40 NCEs 
would take a reader from "rather poor" to "quite good." Similarly, an 
80 NCE gain (from NCE = 10 to NCE r 90) would take a student from 
very poor to very good. 

One final example may add to your sense of how big an NCE really is. 
Suppose that you taught two classes of students in reading, each with 
about 20 to 30 students. Suppose further that each was a fairly 
normal class with a normal range of reading abilities. Now suppose 
that your evaluator told you that one class, on the average , was 
slightly better than the. other. How small an average difference 
(measured in NCEs) could you expect to detect just by working with the 
students? 

The answer appears to be "somewhere around seven NCEs." That is, if 
the average scores of the two classes are within seven NCEs of each 
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other, you probably would notice little if any difference between 
classes. With differences greater than seven NCEs you would begin 
be aware that one class was noticeably better than the other. 



To summarize: 



Less than 7 NCEs is scarcely noticeable to an 
observer. 

The difference between second and third grades is 
about 20 to 30 NCEs . 

By junior high school, one grade level is down to 
about 10 NCEs ." 

A difference of 20 NCEs is quite noticeable. It 
is the' difference between "average" and "rather 
poor" or between "average" and "quite good." 

A difference of 80 to 90 NCEs is the difference 
between the very poorest readers and the very best 
readers in the typical district. 



How Much Noise is There in Measures of Student Performance? 

Error of Measurement in a Single Student's Score — The answer to the 
noise question is "It'depends on whether we are talking about an 
individual student's score or about an average (mean) score for a 
group of students." There is almost always a certain amount of random 
error in a single student's test score. For standardized reading 
tests, this error will fall somewhere within the range of about +10 
NCEs for the majority of students, but for some it will be even 
greater. For about five percent of the students (one out of 20), the 
error may be greater than +16 NCEs. Young students (e.g., second 
grade) tend to have somewhat more errors in their scores than do older 
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students, but we ere spesking very roughly here snd the figures given 
above are close enough for our purposes. 



Error of Measurement in the Mean Score for a Group of Students You 
can see that the amount of error in an individual score can sometimes 
be very large and that you must, therefore, be very cautious about 
assigning a student to a special program or to special materials on 
the basis of a single test score. Fortunately for tha evaluator, 
however, the error of measurement in the mean score for a group of 
students tenci3 to be much lower than the error in individual scores. 
This is because the positive and negative errors from the different 
students tend to cancel out. In fact, for very large! groups of 
students, the random error cancels out almost entirely i nd the mean 
score for the group is certain to be very accurate. 

Of course, the amount of error in any particular single score or group 
mean cannot be calculated in most evaluations. However, a simple 
calculation gives us a good idea of how much error is likely to be 
present. If we know for a given test that about five percent of the 
individual student scores will have errors of ±16 NCEs or greater, we 
simply divide by the square root of the number of students in our 
group to get the range of likely errors^ in the mean score for a group 
of this size. For" example, suppose we have 25 students in the group. 
The square root of 25 is 5. Sixteen NCEs (the' range that covers the 
errors in most of the individual student scores) divided by 5 equals 
about 3 NCEs: 

I 1-168 

2:t 



16 NCEs = 3.2 NCEs. 
25 



Thus, when looking at groups of 25 students, about five percent of the 
group mean scores will be in error by more than +3 NCEs. 



V 



Similsrly, from groups of nine students: 

16 NCEs =5.3 NCEs or about 5 NCEs. 



So, about five percent of group means for groups of nine students will 
be in error by +5 NCEs or more. For the other 95 percent, the errors 
will be smaller. With 4 students, the range goes up to +F NCEs: 



16 NCEs = 8 NCEs. 



Error of Measurement When Comparing Two Groups — There is one further 
complication to be aware of. When one compares the mean scores of two 
groups (or the same group at two different times), each will include 
some error, and the error in the difference score may be greater than 
in either score by itself. For example, suppose you test a group of 
nine students at the end of second grade, and again at the end of 
third grade. Suppose further that the group mean is 20 NCEs in the 
second grade and 30 NCEs in the third grade. 
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Our first reaction is to say that they have improved by 10 NCEs- (a 
small, but probably noticeable improvement).. But, we also know that 
each of the scores could be in error by +5 NCEs. Could the 10 NCE 
gain be in error by double this amount? In the worst case, could 
there be a combined error of -10 NCEs, or ih other words, no gain at 
all? " ' ' 

Nr 

Statisticians can show that an error this large is not likely. To 
find the error in the difference between two scores^ we should not 
multiply the error for a single score by two. The correct multiplier 
is the square xoot of two (which is 1.4).* In our example, ^. .4 times ^ 
+ 5 NCEs is about +7 NCEs. Thus, our apparent gain of 10 NCEs could,-' 
actually be a true gain of 3 NCEs (i.e., 10-7 NCEs). Of courier it 
could slso be a true gain of 17 NCEs (10 + 7 NCEs). In fact, abouf 
five percent of such groups (nine students) with real gains of 10 NCEs 
will appear to have gains greater than 17 NCEs or less than 3 NCEs-." 
For groups of 25 students, the range is about 1.4 times +3 NCEs or 
+4.2 NCEs. 

Analyzing the Data for Program Impact Evaluations 

Once information has been analyzed for student performance, the next 

✓ 

step is to analyze data for determining program effectiveness. 
Analyzing the /iota for program impact requires a demonstration that 



V 

the program has had an impact on student performance, it must be shown 
that student performance is better than expected, and that the program 
and nothing else is responsible. This does not require any special 
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analysis*of the'data*. It requires the use of data from the program 
operations evaluation* component and student outcomes to build a 
convincing argument. In addition to Lhe three analytic steps 
described above, proving program impact will require three basic 
eLements to build a convincing ^argument." - These are: 

o . Evidence t h a t ' s t uden t s have imp roved their 

performance . This type of information documents ^ 
that similar" students in the same schools had 
lower scores in the past . This requires compiling 
data from several different, years.^ 

o Evidence that non-program atudents have not, made a 
similar improvement . This type of information 
examines the possibility that something outside of 
the bilingual program, such as a new principal or 
a new district-wide curriculum, is responsible f or ^\ 
the improvement in bilingual student performance. 
This information, can only be generated by having 
.local comparison groups -- preferably from 
district-wide test data. 

o Evidence that the characteristics of the bilingual 
stud ents have not changed s ince en try into the_ 
program. In some districts, the student 
population can change drastically over a period of 
a year or two* (as when-large numbdrs of new 
, "* arrivals snroll). Some evidence that changes in 
x student population are not responsible for the 
changes in student te-st "scores must be 
demonstrated. 



Analyzing evaluation data, especially program impact evaluation, is 
careful, systematic detective work. It consists of looking for clues 
and followup of any. leads that may help to explain the effects (or 
lack of effects) that are 'observed in data. A clever a^d thoughtful 
evaluator'can often build a convincing 'case by- assembling a variety of 
evidence. Unless it is specifically required that the impact of 
program be assessed, it is better to spend the effort in developing 
the instructional program. 
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Other Issues in Data Analysis 



Single y^ear vs. multi-year analyses Many - hi lingual program 
evaluations are only concerned with measuring the effects of the 
program for a single year. These evaluations are not convincing to 
show program effectiveness. It is, therefore, necessary to 
demonstrate that there is continuing year-to-year progress toward 
program goals. 

Effects of attrition on multi-year evaluation .— The effect of student 
attritions on multi-year evaluations are a problem that all 
^evaluations must be concerned *ith. Multi-year evaluation means 
fo^lowing.-the same students over a'peridd of years. However, as 
students transfer o|bt of" the program, the number of "students in the 
progrankgets smaller and -smaller until the groups may not be large 
enough f«r drawing any comparisons. Another problem is that the ones 
who transfer will' probably be different in many^ways from the ones who 
stay.. While multi-year evaluation can give you veryuseful 
information, it may be impossible to' interpret these results since the 
progrm-may experience constantly changing students. 

- * v 

Floor and ceiling effects -- Floor and ceiling effects are pervasive 

pTobjlems in bilingual program evaluation. A minimal check for these 

effepts on multiple-choice tests is performed by making sure that 

classroom means or school raw scores are no lower than 25 percent, of 

the correct itmes on V ou r-choice tests, ,33' percent for, three-choice , 

1 x * 

and so- on? Mean raw scores should not exceed lb percent of the total 

v 
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possible raw score on any teat. Outside of these values, the 
likelihood of floor or ceiling effects, respectively, should be noted 
in the report. • 14 

Other Analysis Techniques 

Additional analytic techniques such as Analysis of variance (ANOVA), 
Analysis of covariance (ANCOVA), and regression technioues can also be 
used in analyzing the data. These sophisticated statistical analyses 
can be found* in most textbooks on evaluation and ate not even 
mentioned in this volume. This is because the"se approaches require 
many special conditions (like random assignment of student to 
different treatments, and large numbers of students in each group) 
that simply cannot be met in most bilingual programs. The following 
guideline- should be used- when conducting the data analysis activity. 

TuLdelines for Data Analysis 

I. General principles •* 

A. Analyze data both by individual years for short-term goals 
and cumulatively for long term goals. 

B. Separate data according to language proficiency groups. 

C. Separate .data further according to instructional 
treatment. ' 

• II. Preparation (applies to most ev-aluation^designs) * 

v 'A Convert raw scorea to standard scores (preferably 
- normalised standard scores such as NCEs). Use these scores 
for all* analyses. 

B. Separate out tho'se - students with both pre- and posttests. 

1 # Compute means ~and standard deviations. 
2. Plot* the distributions of pretest scores. 
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3* Plot the distributions of posttest scores, 
4* Plot the joint distribution of pretest and posttest 
scores* *' 

' C. For students with pretesj. scores only: 

1. Compute the mean and standard deviation, 

2, : Plot the distribution of scores, 

D. For students with posttest scopes only. 

Save the scores for student £iles and for use as next years' 
pretest scores, 

III, Check for irregularities in the data; 

A, Floor or ceiling effects 

B, Large changes in /tandard deviation from pretest to 
posttest, 

C, Low correlations between pre- and posttest scores, or 
irregular joint distributions, 

D, Differences between students who took the posttest,. and 
those who dropped out, ^ 

E, Look for anyi other features of the data that strike you as 
strange, and be sure that you can explain them. Ideally, 

" item data Should be examined. 

IV, Apply the statistical or other procsdures relevant to the 
particular evaluation design in use. 

Be sure that your analyses are relevant to the questions you are 
trying to answer. 
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8. interpreting the Results of the Evaluation 

The analysis of student outcome data described above, provides the 
program director ?nd evaluator with the quantitative information on 
student performance. If a norm-referenced test Was used, the data 
will show how the bilingual students compared in achievement to a 
national norm group. Hopefully, the results-will show that bilingual 
students achieved as well or better/ These results, however, do not 
provide answers as to why the students achieved. The answer to this 
question may possibly be found by carefully examining the results 
emanating from the evaluation of program operations. 

The evaluator should understand that the two components of the 
evaluation model, the discrepancy evaluation of program operations and 
the evaluation of student performance, are not methodologically linked 
together." As a matter of fact, each component may stand alone.. The 
baseline data developed for the evaluation of program .operation, 
however, does play a .role in designing* the 'evaluation of student 
performance. That is, the baseline data provides information to 
determine what outcome areas should be evaluated. 

In addition, the results' of the program operations evaluation can 
provide the evaluator with valuable information on how the program was 
operated, the instructional approach used, and the amount of 
instruction provided in the first language To^ each academic subject 
area etc. ' This information can be used to "understand" the results 
of the student outcomes component of the*" evaluation. This information 
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is valuable to a perceptive evaluator wishing to find answers to 
explain student performance. For example, if the discrepancy 
evaluation shows that history was taught using the first language to 
fourth grade students, but not to students in the fifth grade, the 
evaluator may want to closely examine the test scores in history for 
those two grades. Depending on what the test scores show, the 
evaluator may be able to make some assumptions -on what caused either 
the same or different level of performance. The evaluator may then 
want to more. closely examine "how" the instruction was provided. For 
example, the^evaluator may want to ascertain the level of language 
proficiency of the teacher teaching in the first language or compare 
the language assessment scores, if available, of the students in the 
two grades.' All this information, when proct ssed. together , could 
provide clues for understanding what .caused the level o'f performance. 



Because the", two components of the evaluations are not methodologically 
linked, there are no specific procedures that can be described for 
merging the two sets of, data. Nevertheless, the recommended approach 
provi-des the evaluator with a significant amount of information to use 
in arriving at conclusions about the program. The analysis techniques 
required f.or the evaluation, as described earlier, are relatively 
simple and can usually be performed \ lay following the instructions in 
the test manuals as well as the disc/repancy procedures described in 
this Handbook. The other ingredient is the creativity of the 
evaluator and project director in their ability to use the information 
to better understand the program and how it might have impacted 

student performance. 

*. 
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Basically', two general categories of information will be gathered by 
the recommended evaluation activities outlined in the Handbook. These 
-include facts such as: the number of students, the instructional 
methods used, the test scores" of students, etc. The other category 
includes opinions generated by this information such as: whether 
there should be more or fewer students in the program, whether the 
instructional methods used ,are appropriate, and whether the test, 
scores are as high as they should be. It is essential to keep 0 this 
distinction in mind when' reporting information about the program 
evaluation. »• 

The" general approach in reporting evaluations-results should be first, 
*to present the facts and second, to present opinions about these facts 
•clearly identifying the source of the opinions. For example, when 
discussing test scores, the fact may be that, as a group, the 
•bilingual students gained ten normal curve equivalents (NCEs) from 
pretest to posftest time. If presented with this information* 
different peop.le m ay* i n t e r p r e t this fa.ct in different ways. 
Differences in interpretations may result from differences in 
understanding of how much gain is typical in a bilingual program, the 
nature of the students involved, the instructional methods used, etc. 
Therefore, the report must include rcareful interpretation of the 
data. 

The procedures 'and results of the evaluation should be clearly 
described. For example, the goals of the English language component, 

may be: ^ - _ ■ • * * 

*> 
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1. Students will gain seven or more NCEs in reading, 
compared to similar students not in the program, 
as determined by comparing their average NCE gain 
from pretest to posttest on the New Improved Rural 
Achievement Test with students in the tests 1 norm 
group. 

2. Students will gain seven or more NCEs in language 
art skills compared to similar students not in the 
program, as determined by comparing their average 
NCE gain from pretest to posttest on the New 
Improved Rural Achievement Test with students in 
the tests' norm group. 

Following a statement of the gpals, a description of the 'evaluation 
procedures used to evaluate this'goal should be presented. These, 
descriptions should include the measurement instruments used, 'the, data 
collected and the analysis procedures. In addition, any information 
about the evaluation process that would effect interpretation should 
also be discussed. For example, a description of the~ evaluation ' 
procedures related to the above goals may be stated as follows: • 

Attainment of the goal was measured by administering 
the New Improved Rural Achievement Test to all students 
in the program during the first week of October- and 
again in the last week of April (the same times when 
the'norming population was., tested ) . Teachers were 
trained to administer the tests and did so within their 
classrooms. The analyses performed were a comparison 
of the pretest-posttest average NCEs to determine the 
amount of gain as compared to .that of the norm group. 
Separate analyses were conducted for the two content 
areas (reading' and language arts), for each grade level 
(2-6), and for' students at two different levels of 
English language proficiency. (Students were 
categorized by these levels of language during the 
selection process for entry into the program.) 

This description should be followed by a presentation of outcomes 
related to specified goals. The presentation of the outcomes of the 
evaluation should include two parts. First, the results of the 
evaluation measurement (i.e., test snores) should be reported. Then a 
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judgment or well reasoned discussion afcout the meaning of the results 
should be offered. These discussions should explain why the program 
is considered to be responsible for the observed outcomes, or 
conversely, why the results should not be attributed to the program. 

This information should be used to make interpretative comments about 
the results. Since these comments will inevitably be sonfewhat 
subjective, it is important to clearly note whose interpretation is 
presented. Interpretations may be made by the evaluator based on 
opinions gathered from program personnel, parents, and administrators. 
In some cases, an interpretative panel may be established officially 
to review and interpret the data. Recommendations whi ch' logically 
stem from the results and interpretations are presented in the final 
section of the report, since the recommendations generally are derived 
from several sets of results or interpretations (e.g., looking jointly 

v 

at student outcomes and parent involvement). 

The recommendations made for program change should stem from a careful 
review, of all the descriptive information and evaluation results and 
interpretations presented thus far. The 1 recommendations may best be 
generated by a team consisting of program staff and the evaluator. 
However generated, the recommendations should be reviewed by the 
program director and selected staff to ensure that no major factors 
which influence the results have been overlooked. Recommendations 
should then be organized according to the aspect of the program they 
relate to -- program operations, parent involvement, staff 
.development, or student effects. 
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CHAPTER V 



PREPARING THE EVALUATION REPORT 

Preparation of the final evaluation report is an important activity of 
the evaluation. The evaluation report is the final and most visible 
product of the evaluation. Steps should be taken to assure that the 
report addresses the purposes and specific questions of the 
decisionmakers for whom the evaluation was planned. In addition, the 
evaluation results should be reported in a timely manner, taking care 
to ensure that the technical aspects of the evaluation effort are 
clearly presented. "Together, these steps increase the usefulness of 
the evaluation results. 

Preparation of the final evaluation report can be a .time-consuming and 
burdensome process if not properly* planned. However, reporting should 
be a continual process occurring throughout the evaluation cycle. As 
recommended in earl'ier chapters, brief summaries- or reports on 
specific activities of the evaluation (e.g., classroom observations) 
should have beer prepared and shared- with program staff as well as 
with key decisionmakers. For example, Chapter III recommended that 
following each classroom observation, a brief report should be 
prepared. Yhese brief reports were in turn to be summarized at least 
three times during the program year—fall, winter, and spring*"-- and 
were to. be shared with program personnel so that they could become 
part of the program improvement process. Thus, these brief reports 
and summaries prepared throughout the evaluation cycle can all feed 
into the final evaluation report thus simplifying the reporting 
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process. The preparation and sharing of evaluation information 
throughout the evaluation cycle also serves to strengthen 
communication between the evaluation audiences and those conducting 
the evaluation, thereby increasing the use of evaluation results. 

The focus of this chapter is the preparation of the final evaluation 
report. The suggestions and guidelines in many cases also apply to 
the reporting mechanisms- recommended throughout the- evaluation cyole. 
The information/in this chapter will prove useful to program personnel 
involved in the evaluation effort as well as to the .person (s ) 
responsible for preparing the final evaluation report. 

There are a number of basic principles which pertain to- the reporting 
process and serve to simplify preparation of the final evaluation 
report. This discussion assumes that completion of the report is the 
primary responsibility of the program evaluator(s) contracted to 
undertake major segments of the bilingual program evaluation. 
Basically, the evaluator has three important tasks: develop an 
understanding of the audiences who will use the information, select a 
proper reporting format(s), and assist the audiences in using the 
results. Proper planning of the reporting requirements will make this 
final activity er»sy to complete. 
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1. Develop an Understanding of t he Audiences 

The- evaluator must understand that cledr communication requires 
knowledge and understanding of the evaluation audiences. The 
identification of the audiences should have been completed during the 
planning stages. However, it is helpful to review who the audiences 
are at the time of reporting. The evaluator communicates with the 
audiences to identify their information needs and their understanding 
of evaluation issues*, such as testing. This will help the evaluator 
to tailor the teport specifically to the level of understanding of the 
audiences and to determine the best form in which to report the 
* results. Contact with the 'audiences also increases the probability 
that evaluation results will in fact be used. 

> 

Understanding the role played by the various audiences in using the 
Evaluation results is also crucial. Some may be involved in 
clarifying the results of the evaluation, while others will be 
involved in interpreting these results. Still others are involved in 
making decisions, and thus are considered to be the key audiences. 
The T roles of the audiences determines the time at which information is 
reported to them. For example, those involved in clarifying the 
• results enter the reporting process somewhat earlier than- those who 
aid in interpreting the results and making recommendations, 
Understanding the roles of the audiences assists the evaluator to 
directing the evaluation report to the proper decisionmakers. 
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2. Select a RepoTtinq Format(s) 

Evaluation reports can take different forms, but whatever the form, 
the report should be designed for a specific audience and be presented 
in a manner that allows for response and interaction. Although the 
most common format. is a written report, which describes the entire 
evaluation, consideration should be given to alternative versions foil 
various groups. 

A news release is a type of written report. Because news reporters do 
not have the time to read full evaluation reports, there is a risk 
that they may write an inadequate or inaccurate news article. To 
avoid this, preparation of a news release is recommended. The 
newspaper will probably adapt the news release to its own style arid 
size limitations. In some cases, a press conference may be held for 
reporting the results to television, radio, and newspaper reporters. 
Interviews with representatives of the media are even more common. 
These may be taped for broadcast on television or radio, or they may 
be the basis for an article by a print journalist. 

Oral presentations are also a major vehicle for reporting to 
professional audiences such as teachers' and program staff. Oral 
presentations are particularly important for highlighting the major 
findings, conclusions, and recommendations, and for establishing 
two-way communication that will clarify, interpret, and influence 
decisionmaking. Such presentations can be enhanced by a panel 
discussion and/or small group discussions of the reported results. 

ERIC 2tU 



Whatever reporting formats are used, the evaluator must focus on the 
audience(s) and their specific needs. The amount of attention given 
to the form of reporting may make the difference between a report that 
is simply received and one that influences practice. 

Several standard elements should be included in the report.- These 
include: 

o Statement of purpose; ^ 
o Program overview and background; 

o The goals and objectives of the bilingual 
program; 

o Description of the program and students; 



o Discussion of the methodology used; including 
design, sampling strategy, instrumentation, and 
data analysis procedures; and 

o Presentation of the findings, conclusions, and 
recommendations for program change. 

The report should be concise and should include easily interpreted 
tables, graphs, and other figures limiting the amount of narrative 
material presented. Important issues, should be identified and 
highlighted in the report if the results of the evaluation effort are 
to be maximized. Techniques such as boxing in recommendations or 
using a different type face are useful to highlight the most important 
points of the report. Examples of actual data collection instruments 
should be included in an appendix. 
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3. Assist the Audience Using the Results 

Once the written report is completed, copies must be submitted to the 
funding agency. Plans should also be initiated to present the results 
of the evaluation to specific audiences. Consideration must be given 
to identify the appropriate person responsible for presenting the 
results. It is recommended that this be the program director and the 
evaluator. A decision as to which of the two will report to which 
audiences is dictated by the situation and deserves careful 
consideration. 

Arrangements should be made to present the results of the evaluation 
to the staff, parent groups, school boards, and school administrators. 
Presentations should include a verbal discussion of the evaluation 
procedures and findings as well as a discussion 1 of the implications of 
the findings. Ample time should be available for questions and 
answers . 



Even- though most of the information presented .at^such a meeting is 
contained in the evaluation report, it cannot be assumed that the 
audience has either read or understands the complete report. Oral 
presentations of evaluation findings frequently enhance the 
credibility of the evaluation and provide the evaluator with important 
feedback on the comprehensibili ty of his/her written work. This can 
be very helpful in improving subsequent evaluation products. Finally, 
a personal explanation of the evaluation provides evaluation users 
~~wfth ah opportunity to ask' quest! b hs ^ a^na ^fece 1 VF* a^sWe rs" an"d 
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explanations; something that simply reading the report cannot " 
accomplish. Worksheet No. 17, which ' follows the instructions for its 
use, provides a detailed outline for the report. 

How to Use Worksheet No. 17 -- Thi's worksheet serves as an outline or 
checklist which can be used to ensure that all necessary information 
is included in the report. Generally, Worksheet No. 19 follows the 
format of this Handbook. The report outline provides a format for the 
presentation of facts and opinions about the bilingual program. Four 
major categories of information are presented: evaluation summary, 
program overview and description, program and student effects, and 
recommendations. Each of these is discussed below. 

Evaluation Summary Information - This summary information provides a 
concise overview of the evaluation findings, conclusions, and 
recommendations. This sectionof the evaluation report, commonly 
referred to as the Executive Summary, is a three-to-five page section 
which should provide the reader (who may be totally unfamiliar with 
the program) with a brief overview of the program's purpose and 
structure, as well as a concise description of how well the program is 
operating and accomplishing its goals. Specific data indicating 
student and program outcomes should be presented. Recommendations for 
program changes based on the data should also be included. The 
Executive Summary is often the only section of the report read by the 
most influential audiences. The Executive Summary can be provided to 

information contained in the complete report. The full report, 
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however, should be made available to interested parties requesting the 
report. 



Program Overview and Descriptive Information — This information 
reports on several of the evaluation activities. ^Overall, factual 
information is presented about the type of students in the program 
(e.g., language proficiency, achievement level, biographic data, 
etc.), their needs, program goals, methods of operation, student 
selection criteria, instructional approach, etc. In addition, this 
information also presents factual information on the purpose of the 
evaluation, its design and the audience(s) who the evaluation is 
intended to serve. 

Program and Student Effect Information — This information reports on 
the more technical aspects of the evaluation which includes opinions 
or evaluative information on the success or failure of the bilingual 
program. -Included is information on each program goal or operation 
that was evaluated as well as a description of the evaluation 
procedures used to evaluate each goal. This description should tie 
followed by a presentation of the outcomes related to the specific 

goals. Included in this description is a discussion of the related 

J 

results as well as an interpretation of the results. 



Recommendations The recommendations made for program change stem 
from a careful review of all the descriptive informttion and 
evaluation results and interpretations presented thus far. The 
recommendations-may -heat he. .generated .by a team consisting of program 
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staff and the evaluator. However generated, the recommendations 
should be reviewed by the program director and selected staff to 
ensure that no major factors which influence the results have been 
overlooked. Recommendations should then be organized according to the 
aspect of the program they relate to — program operations, parent 
involvement, staff development, or student effects. The 
recommendations may relate to changes in goals or changes in the way 
tasks are carried out. 
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OVERVIEW 



This document represents the third and final volume of the Handbook 
for Evaluating ESEA Title VII *Bi] inojjaj Education Programs. The 
Handbook provides practical guidelines and recommended approaches for 
"bilingual education program directors and evaluators to use in 
evaluating bilingual programs. 

» 

In the development of the Handbook, it was readily recognized that a 
single document could not be equally suitable to all bilingual 
education programs. .Obviously, bilingual education programs cover a 
range of languages and grade- levels in a variety of settings. In 
addition, some programs have large evaluation budgets and access to 
teams of highly sophisticated evaluators, while others have limited 
budgets and no evaluation specialists at all. Thus, Volume III, 
entitled Technical Appendix, contains a collection of reference 
material addressing various evaluation issues, as well as lists of 
tests available. These are intended to assist program directors and 
program evaluators in building upon or expanding the e'valuation 
activities identified and discussed in Volumes I and II'. The appendix 
also contains- full-size reproducible copies of all the worksheets 
contained in Volume II. 

The volume is divided into three sections. Section One includes a 
fairly comprehensive list <tf references relevant to the evaluation of 
ESEA Title VII bilingual education programs. Section Two includes 
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reference documents addressing issues related to evaluation and 
testing. This section also contains lists of tests which may be used 
in the evaluation. The section also includes a reference paper on 
S ethnographic methods for describing bilingual programs. Section three 

of this volume includes a set of worksheets for use with Volume II of 
the Handbook. The inclusion of the worksheets in this volume is 
intended to facilitate the reproduction, dissemination, and use of the 
worksheets. 

Volume I, entitled The User's Guide on Evaluation Basics , summarizes 
evaluation procedures providing a summary describing the five 
components of a bilingual education program evaluation. These 
include: planning, managing, and staffing the evaluation; 
establishing baseline data required f or 'evaluation; monitoring program 
operations; evaluating student outcomes; and analyzing and reporting 
evaluation results. 

Volume II, entitled The Designer's Manual for Conduc ting a*n 
Evaluation, describes how to implement each of the components. The 
Designer's Manual contains recommended approaches, forms, and 
worksheets—all designed to assist the program director and/or program 
evaluator in completing the specific tasks associated with the ovpfall 
program evaluation. 
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SECTION I 



REFERENCES 



The following is a fairly comprehensive list of references pertinent 
to the evaluation of ESEA Title VII bilingual education programs. 
Many of the more technical issues discussed in the Handbook can be 
found in these publications. Program directors and evaluators are 
encouraged to familiarize themselves with these publications. . 
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SECTION II 



OBSERVATIONS ON TESTING 



This section contains reference material to provide program directors 
and evaluators with a theoretical and practical background on testing 
issues as well as a series of descriptions of several 
testing/evaluation instruments. The material provides information on 
the selection of achievement tests, language "proficiency tests, and 
self -concept scales. Included are abstracts and/or test summaries of 
tests and scales often used in the evaluation of b.ilingual programs. 
The documents in this section'are included in order to make- this core 
of information readily available to program directors and evaluators, 
thereby facilitating their evaluation activities. 

An additional document found at the end of this section is a 
presentation and discussion of ethnographic methods to develop a 
program description. 
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OUT-OF-LEVEL OR FUNCTIONAL-LEVEL TESTING 



Purpose 



This document- is designed to give teachers, parents and administrators 
a simple overview of the concept called functional-level testing. It 
can be used separately for an awareness of the topic or with other 
available resources to promote skills/for matching test levels to, 
student achievement levels. 

The information presented in this document address five questions.^ 
about functional-level testing. Each of these questions are 
identified and discussed in detail in the following sections. 

How Do I Know If I Need to Use Functional- Level Testing? 

When bilingual teachers evaluate the effectiveness of Kheir projects, 
one piece of evidence to consider is the students' improvement on an 
achievement test. Students' scores at the beginning of the project 
can be compared to their scores at the end. This comparison will 
provide a true picture of the students' improvement if the teacher has 
accurate measures from the test. 

» 

A test that is too difficult or too easy may provide very little 
information about students'" actual achievement levels. Students who 
areVrustrated by a test that is too difficult may give up early, or 
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they may simply guess their way through the test^ If a test is too 
easy, students will find it uncha 1 lenging . .In either case, test 
scores will not provide an accurate indicatipn of their achievement 
level. Such results are a major concern of teachers, especially when 
they realize before testing that nost items on a test are too 
difficult or too easy for some students. 

Functioaal-level testing is an alternative that can'be tielpful in 
situations like these. Because, functional-level testing results in 
improved information by matching a student's ability with the 
difficulty of the test he or she takes, it has been recommended in 
current evaluation guidelines. This paper will provide information 
about functional-level testing; what it requires, and how" it can be 
implemented . „ 

What Are Test Levels? 

Many initial questions need answers when a commercial publishing house 
plans development of an achievement test battery. For example, the 
publisher must determine which basic topical areas will be measured, 
what span of grades the test, should cover and the length of time 
required for test administration. Other major considerations include 
reading and vocabulary levels of the test items, specific content to 
be covered within the given topical areas, and the relative interest 
and difficulty of the material on which the test will be based. In 
weighing these consideration! , the publisher understands that a single 
test covering all grades would be much too long and inefficient to 
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administer to any student (see Figure 1). One solution is to publish 
a series of tests* each known as a level .* A test level may be 
defined as one of a number of strata* the content and difficulty of 
which are appropriate to a given grade or Bpan of grades. Note that 
in Fiaur'e 2 Level C covers a span of grades from the second through 
eighth. 



1 12 
GRADE 

Figurt 1 



Figur* 2 



JJUEUL 

Z2 



6 

GRADE 



12 



♦ The term level must be clearly distinguished from the term form . 
Form t more appropriately termed alternate form or equivalent form , is 
a second test at a given level designed to measure the same content 
using a different, but equivalent, set of test items. 
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Next, the publisher must decide how many levels the planned series 
should incorporate. If only a few levels will be developed, each must 
cover the content for several grades. For example, the test levels 
^shown in Figure 3 cover a broader range of grades than those in Figure 
4. The broader the content covered by a given level, the more likely 
that item content and difficulty will be appropriate for the low or 
High achieving students perhaps both. Narrow content coverage 
within a leve.l may be more relevant for a single grade (see Figure 4). 
However, focusing on such narrow content coverage can result in too 
many tests and be too costly. 



tot uvtts 




Figurt 3 Pigurt 4 



After considering which topics to cover, what content to cover within 
topics and the difficulty of content, test publishers select items for 
inclusion in a test series. Each level, designed for typical students 
in a given grade or span of grades, is known as the recommended level . 
There is not alwajys one test level for each grade level. Sometimes a 
test level* spans or more grades. 

o 
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What is Functional Testing? 



Whenever a student is given a test level appropriate for his or her 
functioning educational level, it is considered -functionaWe vel 
testing . Most students' functional levels are best served when the , 
test publisher's recommended level for their grade is administered. 
This practice is known as in-level testing . Functional-level testing, 
however, allows testing at, below or above the publisher's recommended 
level. 

What is Out-of-Level Testing? 

The recommended level of a test does not^always contain the most 
appropriate content or difficulty for students with very low or very 
high performance levels. When testing such students, it may be 
desirable*to administer a test level other , than the specific level 
recommended by "the test publisher for typical students in that grade. 
This practice, called out-of-level testing , is employed when the 
recommended test level is expected to be much too easy or too 
difficult for the students. 

The use of tests at levels below those recommended by the publisher is 
an option if the content of the program can be measured better this 
way. Students in bilingual programs may be learning skills, such as 
English reading, at a later time than other students and therefore, 
should receive the same test at a later point. In order for any test 
to be suitable, the average score of the group tested should be 
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between 1/3 and 3/4 of the maximum (Roberts, 1976 ). """Otherwise, 
'ceiling or floor effects depress estimates of student gains. Some 
p:ub.iishers provide norms for the administration of a single test in 
'{federal grades. Other publishers provide expanded standard scores 
that link up all levels of a test on a < common scale, and occasionally, 
locator tests, to facilitate out-of-level testing. Generally, a test 
should Se used no more than one level below that recommended by the 
publisher. But care should be taken that in testing out-of-level, 
pretest floor effects are not being replaced by posttest ceiling 
effects. - 0 



Why Test Out-of-Level? 

Achievement -testing is used to obtain a reliable and valid measure of 
student achievement. Factors contributing to unreliable and invalid 
test scores may include test administration procedures (e.g.. adhering 
to timing and directions), physical surroundings ( e .g. , spacing of 
chairs, temperature, lighting, etc.), student characteristics (e.g., 
motivation, physical well-being, etc.), and test characteristics 
(e.g.. difficulty level, content, format, etc.). 

Although f un c t io na 1 - 1 e v e 1 testing-does not address all of these 
concerns, it does consider test characteristics and has the potential 

t 

to affect students' motivation. Test characteristics of content and 
difficulty level axe very important. For example, consider test 
content. Different levels of a test series emphasize different skills 
and the cpntent can be quite different even though the subject area 
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remains the same. A selected test level should match the content 
material taught. If a test does not match what students are being 
taught it will not be sensitive to learning and gains which actually 
occur moy not be shown. 

Now consider test difficulty. When a test is too difficult for a 
student, guessing is likely to occur, creating problems for both the 
reliability and validity of the resulting test scores. In turn, the 
assessment of student achievement and the evaluation of programs are 
affected. Guessing increases most students' scores in multiple-choice 
tests. Some students' entire scores can be a reflection of the luck 
involved in random guessing. The laws governing these scores based 
upon random guessing are the same as those governing who wins and who 
loses at Las Vegas; consequently, they are known as chance scores. 
For example, if a group of students were to take a 100-item test with 
four opeions^per item, and randomly guess at all items, the, average 
score for the group would be approximately 25. Obviously, chance 
scores do not provide accurate information about a student's level of 
skill development. Students whose scores are primarily a result of 
■guessing on a test that is too difficult may need to be. tested with an 
easier, lower level of the test. In Figure 5 we see that the students 
scoring in. the chance range (shaded area) of Test Level C may need to 
be tested at a lower test level Level B in this case. 
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Chanct Scort Rangti ^c^^T $T ' LEVEL C_ 
Tt»t at Lowtf Ltvtl Y 



LEVEL B 



Figurt 5 For studtnts scoring in tht chtnct scort rangt tht ttst Itv*! wts prootbly too 
difficult. Thty may nttd to bt .tttttd with a lowtr ltvtl of tht ttst. 



On the other end of the achievement spectrum there are students for 
whom the test level is too easy, limiting such students 1 ability to 
demonstrate their skill development. They too may need to be tested 
out-of-level, but with a more difficult test. The shaded area in 
Figure 6 depicts the high range for two. test levels. Students scoring 
in the high range of Level C may heed to be tested with a more 
difficult level of tests — Level D in this case. 



LEVEL D 



LEVEL C 



A Ttst at Hightr Ltvtl 



High Scort Rangt 



Figurt 6 Studtnts scoring in tht> high rangt of a test Itvtl.mty nttd to bt ttsttd with a 
hightr ltvtl of tht ttst. 
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In addition to being misleading about a student's true skill level, a 
test that is either too easy or difficult' can misrepresent student 
gains in achievement. Consider the following diagram in which the 
line at the bottom represents all. there is to know about a certain 
topic and the lines above indicate the portions of' the topic covered 
- by various test levels. 



[Level 2 



I Level 4 



I Level 3 



Zero Knowledge 



[Level 1 



8 



Apparent Gain (B-A) 
Actual Gain (B-C) 



100% Mastery 
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Suppose a group of students is' given Level 3, but it is too difficult 
for therti. They- may guests' on may items and score in the chance'' range 
of the test.— at point A..* Let 'Vass^me, their posttest -performance 4 
wouf»d show improvement; and they would score at" point B. Their 
apparent "pain is the distance, between the pretest and posttest ,(B-A ) . 
(However, 'if students had been tested *at ^their.'functiengl level. Level 
2 prebably wauld have been given at the pretest^ Guessing weuld be 
less a factor since the test difficulty at this lcwer level i's mere 
Closely matched tc student .achi evement'. Their scare 'may have been 
something near point C* So their actual gain frcm-pretest te pcsttest 
is B-C. • ' • . 

In summary, the reccmmended test, level for the'average student in a 
certain, grade may net accurately measure the achievement level ef 
every student in that -grade. Some" students 'will function at a higher 
achievement 1 e v e 1 , ' s o m e' a * a lower level. In either case, 
out-of-level testing could 'provide a better measure of student 
achievement . 
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SUMMARY 



'Achievement tests provide useful information for evaluating the 
effectiveness of programs.' The value of such . information is* obviously 

.related to its accuracy. Achievement tests are designed to accurately 
measure the achievement level of average^ students in a certain grade 
level. However, they- may /not accurately assess. the f achievement level 
bf.all students at that grade level- • - 

A student's 'functional level may be below a test publisher's 
recommended test level. An d a very low tes* score on a recommended 

test level may indicate that guessing (chance) played an important 

" \ 

role in the result. - Students whose scores are primarily .a result of 

'< i 

guessing on "a test that is too difficult may need to be tested out of 
level; tested with ap easier, lower level of 'the test. 

Xunctional-level testing, therefore, involves testing students with 
test levels most appropriate to their achievement levels. 
Functional-level testing can .involve testing students with the 
recommended test level (in-level testing), or .it can mean testing 
students' with a test level below or above the recommended level 
'(out-of-testing) . Whatever the case, the goal is to test. at a level 
affording students the most opportunity to demonstrate their 
abilities . 
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SELECTING AN ACHIEVEMENT TEST* 

«s t - 

In selecting achievement tests for the evaluation of .bilingual 
programs, evaluators must consider all the' same criteria that are used 
in selecting any achievement test as. well as additional criteria that 
relate to the nature of - the program and the stude'nt population. This 
discussion will .give most emphasis Kp issues in test selection that 
are especially important for bilingual education evaluations. 

Teat Bias 

During the last ten years extensive attention has been given to the 
effects of test bias for culturally different populations (War^o, 
1977; Hoiits, 1974). As a result, test publishers have made concerted 
efforts in this area and many standardized achievement tests have been 
revised! The technical manual of a test will o f t en • inc lude a 
discussion of what procedures were undertaken to minimize bias. The 
two most common procedures are: (1) review of the content of the 
items by a culturally sensitive panel and (2) statistical item 
analyses . 



* Adapted with author's permission from: "An Evaluation of Project 
Information Packages (PIPs) As Used -for the Diffusion of Bilingual 
Projects," RMC Research Corporation, U.S. Department of Health, 
Education and Welfare, Office of Education, 1980. 
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Review of Content -- Reading and examining the content of items may 
result in rewriting items so that they seem fairer to' all groups 
involved. However a visual examination alone cannot determine if an 
item is biased, i.e., that it will function differently for different 
groups of students. What can be accomplished is the elimination of 
stereotypical wording or content. External review panels have the 
advantage of insuring a disinterested reading, although in-house 
groups may also be effective. 'This procedure may result in a more 
acceptable test, but will not necessarily eliminate biased items. 

Item Analysis Item analysis' is a statistical procedure that is 
performed routinely in test construction. The scores of students on 
each item are compared to their scores on the whole test in order to 
determine if each item is measuring what the whole test measures, and 
in fact should be part of that test. - When this procedure is used to 
eliminate bias towards a specific group, the test is administered to 
both the general population and to the specific group. Then item 
" analysis is performed in order to determine that the same items 
function similarly for both groups. For example, if an itemfis 
difficult for one group it should be difficult for the other 
regardless of the mean test, scores for each group. If an item is. easy 
for one group but difficult for another, then such an item exhibits 
bias, and should probably be eliminated. 
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Additional Selection Issues ^ 

Consideration of subtest content and weight in scoring is important 
for selecting the test that most closely matches the curriculum and 
for determining whether inllevel testing is appropriate. Such issues 
are important for all students, but they may be even more critical for 
students of limited English proficiency. Although the curriculum of 
bilingual programs may contain the same fina^L objectives, skills such 
as English reading may not be taught in the same grade levels as other 
programs*. 

The wording of the instructions to the test should be considered. The 
language of the instructions should not be more difficult than the 
language used in the items that actually appear in the test. Although 
directions containing needlessly complex sentence structures are a 
handicap for all students, they will cause an even greater difficulty 
for students of limited English proficiency. Examiners may want to 
consider systematically simplifying test directions, but if norms are 
to be used, this may affect their validity. 

Additionally; the content of the test should be examined to determine 
- theextent to- which it tests the out-of -school experience of the 
children. The experience of the culturally different child and of the 
low SES child may differ signif fcantly from that assumed by the 
a-uthors of the test. Therefore, the'more the test relies on 
out-of^school experience, the more-it may discriminate against the 



i 1 1 



target population and the less valid it will be for evaluating program 
impact. 

Finally* if bilingual tests are used, the nature of the translation - 
should be considered. Some tests are direct translations except where 
such a translation would clearly be impossible. Other tests provide 
equivalent versions where the kinds of items and the difficulty level 
are roughly equivalent, but the content of the item may be completely 
different. Other t^sts are a combination of both methods. In a 
translated test, the difficulty level may not be the same for both 
version's. However, very few test publishers provide equivalent 
versions. 

Language of Testing 

In many bilingual education evaluations, the evaluator must decide 
what testing language is appropriate. Several questions have to be 
considergd individually and in relation to each other. First, what is 
the language of instruction for the subject that will be te'sted? 
Because the language of instruction for math, for example, may be 
, different for students in the same class 05 may be different at 
various times during the y^ar, this question may not' be answered 
simply. Second* what is 'the dominant language of the child as 
established by a systematic assessment procedure? Third, what are the 
project goals? Goals may require testing in a particular language. 
Ideally . of course, students should be tested in the language in which 
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they will peform the best. However, that language 4 may not always be 
the dominant one. For example, a student may be more fluent in 
Spanish, but if almost all math instruction has been in English, the 
student may perform better on an English test. 

There are other issues involved in planning testing in more than one 

language that have not yet been studied in sufficient detail. Some 

evaluators doub le-test the project students, avoiding the chbice of 

test language by testing in both languages. The benefits of this 

practice are clear: more information is obtained about the students' 

proficiency in content and language and the dangers of testing only in 

the weaker language are avoided. However, the additional expense, the 

added burden on teachers and students, and the possibility of practice 

effects represent significant disadvantages. In addition, the 

language of some students may be neithlr standard English nor standard 

HP 

Spanish . 

Where tests exist in two languages, the. non-English language may be 
the most appropriate language for the pretest. However, after a year 
of English instruction, English may be more appropriate for the 
posttest. Longitudinal studies will almost certainly include scores 
in both languages reported at different stages of a student's 
progress. Evaluators will'lhave to consider carefully the 
interpretations of such scores. — 



Limits to the Usefulness of Norms 

The use of national norms as a comparison standard in an evaluation 
relies on the validity of a principle -known as "the equipercentile 
assumption." This assumption implies that in, the absence of any 
special instructional treatment students in the project would have 
grown at -a rate comparable to that of students in the norming sample 
who obtained the same mean pretest value. Such' an assumption can only 
be valid if the project population is similar in educationally 
relevant ways to the population represented in the norming ^sample. 
This is not usually the case in bilingual education programs which are 
generally comprised of students of limited English proficiency, 
bilingual students, and a larger proportion of low SES students than 
is found in the general population. While the accuracy of the 
equipercentile assumption for such populations has not yet been 
"systematically assessed, it is unlikely that norms for English 
achievement tests can provide prec ise no- treatment expectations for 
bilingual project students. There are no statistical techniques to 
adjust for differences in expected growth between the project students 
and the norming population (Tallmadge, 1976). 

f 

Recently, data have been gathered on Spanish language achievement 
tests. The most recent editions of the Comprehensive Test of Basic 
Skills (CTBS) and the Inter-American Series both furnish norms tables 
for English and Spanish versions of their tests, but the manner in 
which such norming data were compiled limits their usefulness for 
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evaluating the impact of bilingual projects. The CTBS Espanol norms 
were developed by administering the CTBS in both languages to a 
balanced bilingual, biliterate population as determined by scores on 
the SERVS test. The assumption was made that a student's standing in 
the norms would be the same in English and Spanish. Student's scores 
in Spanish were then equated with their rank in the English norms. 
Although the assumption that a perfectly bilingual person will possess 
the same knowledge of content in two languages is logical, the 
possibilities for error are so large that the Spanish norm conversions 
can provide only very rough estimates of student achievement. There 
are several other reasons why the CTBS norms cannot be used to provide 
a precise estimate of project impact. Because the scores in the norms 
table are exj/rapo lated rather than derived empirically, they are 
subject to / certain amount of error inherent in any estimation 
procedure./ In addition, the balanced bilingual population in the 
sample is 7 not comparable to the population of most bilingual programs 
which include students with a range of language prof iciences . 
Finally, because the students in the sample were in bilingual 
programs, they do not provide an estimate of how similar students 
would have performed without any special instruction. 

The Inter-American norms were no't constructed from a national 
probability sample. They are "user norms" derived only from those 
groups in the population to whom the Inter-American tests were 
administered in the course of. local evaluations. For certain tests. 
% ^he sample obtained in this way numbers over a thousand students, but 
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for others the N is less than 100, severely limiting the reliability 
of normative data, particularly in the extreme score ranges where 
estimates are based on relatively few cases* Because the norming 
group was not specifically constructed to represent the population of 
limited English and bilingual students, unknown biases may exist in 
the sample. Because students, in the sample are also in bilingual 
programs* the norms do~not provide an estimate of how similar students 
would have performed in the absence of a special program. 

The question of how a group of students would have performed without a 
bilingual project cannot be answered by simply consulting currently 
available norms. But existing norms can be used to answer other 
evaluation questions. Well constructed norms based on national 
probability samples, such, as those provided" by the major achievement 
tests, can be used to show how the bilingual project students compare 
to national averages. Norms based on more specific populations, such 
as those constructed for the Spanish versions of the CTBS and the 
Inter-American, can be used to show how p'ro.jec* students compare to 
the bilingual/biliterate CTBS sample or the bilingual project students 
in the Inter-American sample. 

Ou t-of-Level Testing -- The use of tests at 1 levels below those 
recommended by the publisher is an option if the content of the 
program can be measured better this way. Students in bilingual 
programs may be learning skills, such as English reading, at a later 
time than other students and therefore should receive the same test at 



a later point. g$ order for any test to be suitable, the average 
score of the group tested should be between 1/3 and 3/4 of the maximum , 
(Roberts, 1976). Otherwise, ceiling or floor effects depress 
estimates of student gains. Some publishers provide norms for the 
administration of a single test in several grades? Other publishers 
provide expanded standard scores that link up all levels of a test on 
a common scale, and occassionally , 1 oc V at or ' tests , to facilitate 
out-of-level testing- Generally, a test' should be used no more than 
one level below that recommended by the publisher. But care should be 
taken that in testing out-of-level, pretest floor effects are not 
beyig replaed by posttest ceiling effects. (Note; This topic is 
discussed further in a preceding document .entitled "Out-of-Level or 
Functional-Level Testing.") • 

« 

Introduction to Test List and Summaries 

An extraordinary number of tests could be used to evaluate basic 
subject areas for bilingual programs. Some of these tests are locally 
developed and hav.e not been administered to large samples of the 
population. Therefore, they are less likely to have the technical 
qualities required by most evaluators. Other tests are limited 'to 
only one content area, and cannot be used by themselves to evaluate a 
bilingual project which includes several content areas. Finally, many 
evaluators will first consider the appropriateness of tests already in 
use in the district" for th* evaluation of the bilingual program. 
Certain tests may be mandated or choices -may be constrained in other 
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ways. Selection of a test already being^ used for district-wide 
assessment introduces tiie possibility of comparison with local 
non-project students. This comparison alone. cannot provide a precise 
estimate of project impact, but may answer. other evaluation, questions, 
such as how prqject students compare in achievement level and rate of 
growth to other students in the district. 

The following sections of this document are intended to provide^ 
helpful information about tests that, for the reasons' discussed above, 
are already likely to be under consideration by project evaluators. 
First, an annotated test list is presented which includes information 
about major testa of achievement that include both math and reading or 
language subtests which are available in two languages. Second, a set' 
of achievement test summaries, developed by the Region ,V Technial 
Assistance Center ( T AC ) , are provided -as an additional information 
resource for program directors and evaluators. .Finally, a list-of 
publishers is provided for future reference for program directors and 
evaluators seeking additional achievement test information. 

Annotated Test List 

The annotated test list contains .only major tests of achievement that 
include both, math and reading or] language subtests. All such tests 
available in two languages were included. Tests only available in 
English were limited to those included in "the Anchor Test Study 
(Loret, 1974). Finally, all of the tests were discussed only as they 

3 13 

I! 1-42 . ' 



I 

apply to evaluations of grades K-6. 

The same categories of information are provided for each test to 
facilitate comparison. All of the tests are available from major 
publishers. Technical aspects of such tests are likely to be as good 
as t.he state-of-the-art. All of the tests have technical manuals 
describing the process of test construction and standardization. 
Except for an occasional subtest, all of the tests are designgdjio be 
administered in groups. Administration time for each test varies 
'according to the number of subtests used. Subtests are listed only 
where they contribute to a total score in reading, language arts, or 
mathematics, three major areas of interest to bilingual program 
evaluation. 
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. EL CIRCO 
1979 . 

1. Languages: Spanish and English 

Spanish tests allow the test administrator to select among 
alternatives the word most appropriate for the students' dialect 
of Spanish . t 

2. Publisher's recommended in-level use: Tests can be used at 
pre-school, kindergarten, and beginning of first grade. 

3. Subtests:* 
Cuanto y Cuantos 

Para Que Sirven.Las Palabras 
What Words are For 

Quanto y Quantos is a direct translation of Level A of How Much 
and How Many of CIRCUS. Para Que Sirven Las Palabras and What 
Words are For are equivalent, but one is not a translation of the 
other. For example, each test has items testing comprehension of 
the past tense but the items will have different content. 

4 Norming- The El Circo measures were administered to a nationwide 
sample of children from the Spanish-speaking cultural groups. 
Empirical norms exist for fall only. 

5. Out-of-level testing: Separate norms exist for preschool, 
kindergarten, and first grade. 

6. Procedures for mirjimizing bias: Items wero reviewed by a 
cultural advisory committee composed of speakers of Puerto Rican, 
Mexican, and Cuban Spanish. 



♦Several tests have been developed as part of El Circo, but only the 
ones listed are available as of spring 1980. 
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California Achievement Test, 1977-78 
Forms C and D 



Languages: English 

Publisher's recommended in-level use: 



Level 

Level 10 

Level 11 

Level 12 

Level 13 

Level 14 

Level 15 

Level 16 



Grade 
K.0-K.9 
K.6-1.9 
1.6-2.9 
2.6-3.9 
3.5-4.9 
4.5-5.9 
5.5-6.9 



Subtest Components: 



Level: 10 11 12' 13 14 15 16 



Pre-Reading 

Listening for Information 

Letter Forms 

Letter Names 

Letter Sounds 

Visual Discrimination 

Sound Matching 



X 
X 
X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 



XXX 

XXX 

X 

X 



Reading 

Vocabulary 
Compr ehension 
Phonic Analysis 
Structural Analysis 

Language Total 

Language Mechanics 
Language Expression 

Mathematics Total 
Computati on 

Concepts and Applications 

Norminq: Weeks rather than midpoint dates are provided for 
empirical fall and spring norms. These are the week in which 
November 3rd falls, and the week in which May 4th falls. Tests 
can be administered two weeks on either side of these weeks 
without the, use of interpolated norms. 



X X X X X 
X X X X X 



X X X X X X 
X X X X X ' X 



Out-of-level testing: 
and a locator test. 



Provides an expanded standard scdre seal 



Procedures for minimizing bias: Test writers followed guidelines 
to avoid bias in the development and editing of items. Items 
were reviewed by of various ethnic and cultural groups. An 
extensive item analysis was conducted with the tryout items to 
compare responses of "Black" students and "other" students. A 
point biserial correlation was used to show the relation of items 
to category objective scores, and grade-to-gr,ade growth as^ shown 
by item difficulties was also examined. The percent of biased 
items found in the trial items for the various subject areas 
ranged from 25 to 7 percent. After revision the percent of 
biased items was reduced to the 3-0 percent range. ■ 
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CIRCUS 
1976 



1. 'Languages: English 

2. Publisher's recommended in-level use: 



L eve 1 
Ci reus 
Ci reus 



A 
B 



Circus C 
Circus D 



Grade 



Nursery School and Kindergarten 

Kindergarten - Spring 

First Grade - Fall 

First Grade - Spring 

Second Grade - Fall 

Second Grade - Spring 

Third Grade - Fall 



- Fall 



3. Subtests:* 



Level 



B 



X 

X X 
X X 



X 
X 
X 



X 
X 



Pre-Reading 
Reading 

Liaten to the Story 
Liatening 

How Much and How Many 
Mathematics 
Writing Skills 

H Norming: The Circus was administered to a national probability 
sample during the fall (October) only. Therefore, the comparison 
of a oroup to the national sample for pre- and posttesting can be 
done for a fall-to-fall evaluation design only. Information is 
also provided in sentence form describing what each range of 
scores means in terms' of skills mastered. A fall to spring 
comparison of the proportion of students falling in each category 
could be made, but would require the use of a local comparison 
croup to determine the normal growth expectation. Separate 
tables exist fpr comparing groups and for comparing individuals. 



»Many other su btests are provided, but only these that coordinate with 
the STEP are listed here. No total scores are possible from any 
combination of subtests. ' • 

The subtests listed above provide coordination through content and 
expanded standard scores with the following subtests of STEP III, 
Level E-3; Reading. Listening, Math Concepts and Math Computation, and 
Writing Skills. 
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The normative data are very well suited to individual student 
evaluation because the national sample is divided h into subgroups 
such as sex ♦ geographic region, and SES. 

Ou'-Qf-level testing; Expanded standard scores can tie used for 
subtests that coordinate with STEP III,. * 

Procedures for minimizing bias: No statistical procedures are 
rtported. Separate norms are provided according to categories 
such as sex f geographic region, and SES. 
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Comprehensive Test of Basic Skills 
English Version 1973, Spanish Version 1978 

Form S 



l.» Languages: English and Spanish 

The CTBS/Espanbl is a direct translation of .the English CTBS/S 
with the exception of certain items which could not be translated 
or which required different translations for dialects of Spanish. 
In such. cases equivalent items have been constructed. 

Publisher's recommended (in-level testing): 

Enalish CTBS/S CTBS/Espanol 



Level B 



Grades K.6-1.9 Grade 1 

L eve l C Grades 1.6-2.9 Grade 2 

Level 1 Grades 2,5-4.9 Grades 3 and 4 

Level 2 Grades »4. 5-6. 9 . Grades 5 and 6 



3. Subtest components: 
Component 

Reading 

Word Recognition 
Reading Vocabulary 
Reading Comprehension 

Mathematics 

Math Computations 
Concepts & Applications 



Level 



XXX 
X X X X 

X X" X X 
X X X X 



4 Norming: The norms for the Spanish version of the CTBS were 

derived through a spring testing equating this version with the 
nationally representative English language norms. The 
no-treatment expectation obtained by their use is not referenced 
to a Limited English Proficiency population but rather to the 
' Enalish language performance that could be expected . from the „ 
bilingual/biliterate population on whom the equating was done. 
The scorino patterns in both English and Spanish for limited 
English proficiency students may be quite different; therefore, 
the norms do not present a precise standard of comparison. 
Empirical norms exist for the English CTBS for spring for grades 
2-6, and for fall and spring for grades K and 1. 

5. Out-of-level testing: An expanded standard score scale is 
available for the CTBS/S norms. 
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6. Procedures, for minimizing bias: Prior to standardization these 
items were reviewed by Black and Spanish-speaking consultants. 
In addition, trial items were administered to a-sample of Black 
students and "other" students. Items with a point-biserial 
coefficient of less than .2 were rejected. A subsequent analysis 
was made of the test results of Black students. Spanish-speaking 
students, and other students. Although the mean scores were 
* lower for the Black and Spanish-speaking group, the tests 
appeared to be functioning similarly for both groups. 
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Inter-American Series: Test of Reading, 1962-69 
Forms CE, DE, CEs, DEs 



Languages: English, Spanish, and French 

Spanish -version is an exact translation of English versioTi. 
Publisher's recommended in-level use: 



Level 1 
Level 2 
Level 3 

Subtest components : 



Grade 1.5-2.5 
Grade 2.5-3.9 
Grades 4, 5, 6 



Level 



Components 

Vocabulary 

Comprehension 

Level of Comprehension 

Speed of Comprehension 



X 
X 



X 
X 



X 
X 



Norming: The Inter-American norms were not developed using a 
probability sample. They are based qn data collected from test 
users. The test manual states that these norms "should be 
applied with caution until local norms can be developed. 11 
Although N f s for some tests consist of more than a thousand 
students, others comprise less than a hundred students. For 
these reasons, the norms do not provide a convincing, precise 
standard of comparison. 

Out-of-level testing,: Norms are provided for out-of-level 
testing; however, above comments regarding norms should be taken 
into account. 

Procedures for minimizing bias: Content was 'selected as being 
familiar to English and Spanish speakers of the Western 
Hemisphere. A semantic frequency list was consulted in wording 
the translation, but the manual states that frequency is not 
always an indication of difficulty level. Spanish trial items 
were administered to Spanish speakers, and English trial items 
were administered to English speakers. Item analysis and item 
selection were then performed on the basis of test results. 
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Inter-American Series: Test of General Ability, 1961-72 
Forms CE, DE, CEs, and DEs 



Languages: English and Spanish 

Spanish version is an exact translation of English version. 
Publisher's recommended in-level use: 



Preschool Level 
Level 1 
Level 2 
Level 3 

Subtest components^ 



Components 



Oral Vocabulary 

Number 

Association 

Classification 

Analogies 

Sentence completion 
Computation 
Word Relations 
Number Series 



Ages 4 and 5 
Grades end K, Grade 1 
Grades 2, 3 
Grades 4, 5, 6 



Level 



Pre- 
School 



X 
- X 
X 
X 



X 
X 
X 
X 



X 
X 

X 
X 



X 
X 
X 
X 
X 
X 



Norming: The Inter-American norms were not developed using a 
probability sample; the norms are based on data collected from 
test users. The test manual states that these norms "should be 
applied with caution until local norms can be developed. - 
Although N's for some tests consist of more than a thousand 
students, others comprise less than a hundred students. For 
these reasons, the norms'do not provide a convincing, precise 
standard of comparison. 

Cut-of-level testing: Norms are provided for out-of-level 
testing; however, the above comments regarding norms should be 
taken into account. 

Procedures for minimizing bias: Content was selected as being 
familiar to English and Spanish speakers of the Western 
Hemisphere. A semantic frequency list was consulted in wording 
; 'the translation, but the manual states that frequency is not 
always an indication of difficulty level. Spanish trial items 
were administered to Spanish- speakers , and English trial items 
were administered to English speakers. Item analysis and item 
selection were then performed on the basis of test results.. 
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IOWA Tests of Basic Skills, 1978 
Forms 7 and 8 



Languages : English 

Publisher's recommended in-level use: 



LeV el 



Primary Battery 5 
Primary Battery 6 
Primary Battery 7 
Primary Battery 8 
Multilevel Battery 9 
Multilevel Battery 10 
Multilevel Bat ery 11 
Multilevel Battery 12 

Subtest components: 



Reading 

Reading Comprehension 

Pictures 

Sentences 

Stories 

Reading 
Vocabulary 
Math 

Math Concepts 
, Math Problems 

Math Computations 

Math 
Language 

Spelling 

Capitalization 

Pundtuation 
- Usage 

Language 
Listening * 



Grade 

K.l-1."5 
K.8-1.9 
1.7-2.6 
2.7-3.5 

3 

4 

5 

6 . 



X 
X 



Forms 

*> 

7 
7 
7 
7 

and 8 
and 8 
and 8 
and 8 



Level 



8 



9 10 11 12 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



Norming: .Empirical norms exist for 15 October and 15 Aj>ril, 

Out-of-level testing: An expanded standard score scale is 
provided. 



Procedures for minimizing bias: Authors with diverse cultural 
backgrounds participated in writing of test. 



Sequential Tests of Educational Progress 
(STEP) III. 1979, Forms X and Y 



1. 



2. 



La nguage s : Eng lish 

Publisher^ recommended in level use: 



Level 



Grade 



Intermediate E 
Intermediate F 
Intermediate G 



,3,5^.5 
4.5-5.5 
5.5-6.5 



Subtest components: 



Level 



E 



F 



Reading Total 


X 


X 


> 

X 


Vocabulary 




, X 


-x 


Comprehension . 


X 


X 


' X 


Inference 


X 






Math 








Mathematics Basic Concepts 


X 


X 


X 
X 


Mathematics Computations 


X 


X . 




Language: Writing Skills 


x • 


X 


• X 


.Spelling 


. X 


X 


X 


S Capitalisation 






X 


Word Structure and Usage 


X 


X 




Sentence and Paragraph Organization 


X 


X 


X 


Language: Listening 




X 


X 


Listening Comprehension 


X 




X 


Following Directions 


X 


X . 





Norming: Empirical norms are available for fall and spring. 
Midpoints of the norming periods are 5 October and 10 May. 

Oub-of-level testing: Provides expanded standard' score scale and 
also out-of-level norms. Has locator test. 

Procedures for minimizing bias: Items were edited by in-house 
minority and women test specialists, and by an external minority 
review panel . 

Additional comments: Can be used in conjunction with CIRCUS, 
1978, because of the coordination "of test content and an expanded 
standard score scale. 
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Metropolitan Achievement Tests 
(MAT) 1976 Forms 01 and Kl 

* 



Languages: Eng lish 

Publisher's recommended in-level use: 

Level Primary 

Primer K.5-1.4 

Primary 1 1.5-2.4 

Primary 2 2.5-3.4 

Elementary 3,5-4.9 

Intermediate 5.0-6.9 

Subtest components 

Primary Primary Elemen- Inter 
Primer 1 2 tary mediate 

Reading 

Comprehension* 
Language 

Listening 

Comprehension 

Punctuation and 

Capitalization 

Usage 

Grammar and Syntax 
Spellino 
Study Skills 
Math 

Numeration 
Geometry and 
Measurement 
Problem 5olving 
Operations: Whole 
Numbers 

Operations: Laws 
and Properties 
Operations: Frac- 
tions 4 Decimals 
Graphs & Statistics 

(\. Norming: Empirical fall and spring norms have been 
develope'd with midpoints of 15 October and 20 April 
respectively.- 



X 
X 

X 

X 



X 
X 
X 
X 
X 



X 
X 



X 
X 
X 
X 
X 



X 
X 



X 
X 
X 
X 
X 



X 
X 

X 

X 



X 
X 
X 
X 
X 



X 
X 

X 

X 

X 
X 



* Additional reading subtests such as rate and auditory 
discrimination are available, but they are not part of the 
c oroprehens i-on score. , t 



5 Out-of-level testing: Provides an expanded standard score 
scale. .Out-of-level testing should be no more than one level 
below that recommended for the grade. 

6. A combination of objective and subjective methods was used. 
0 to identify ethnically biased items on the MAT. Following- 
review by a panel of ethnicaly diverge educators, test items 
were examined for bias using three conceptually different 
statistical methods. , Items tagged as biased by either the 
subjective or objective procedures .were subsequently revised 
or eliminated. 
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SRA Achievement Series, 1978 
Forms 1 and 2 



Languages: English 

Publisher's recommended in level use: 

Level Primary 

A K.5-1.5 

B 1.5-2.5 

C 2.5-3.5 

D 3.5-4.5 

E 4.5-6.5 

Subtest components: 



Level 



C 



D 



X 
X 
X 
X 



X 
X 
X 
X 
X 

X 
X 



X 
X 
X 
X 

X 
X 



X 
X 
X 



X 
X 

X 
X 



X 
X 
X 



X 
X 

X 
X 
X 

X 
X 
X 



Component 

Reading 

Visual Discrimination 

Auditory Discrimination 

Letters/Sounds 

Listening Comprehension 

Vocabulary 

Comprehension 
Mathematics 

Concepts 

Computation 

Problem Solving 
Language Arts 

Mechanics 

Usage 

Sp e 1 1 i ng 

Norming: The norms are based on a nationally representative 
sample of students. Empirical spring norms are available with 
temporary fall interpolated norms. Empirical fall norms are 
currently being developed. Empirical fall and spring norming 
dates are: 7 October and 25 April. 

Out-of-ltvel testing: Out-of-level testing can be interpreted 
using the SRA expanded standard score scale known as GSV (Growth 
Scale Value). 

Procedures • for minimizing bias: Items were edited by 
representatives of minority and women's groups. The trial items 
were administered to a sample that included Black, Hispanic. 
"American- Indian, and non-minority subsamples; The items were 
then examined statistically and items which were easy for one 
group but difficult for another were eliminated. 
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Stanford Achievement Test* 1973 
Forms A« B« and C 



1. Languages: English 

2. Publisher's recommended in-level use: 

Level * Primary 

Primary I 1.5-2.4 ^ - 

Primary II 2.5-3.4 
Primary III 3.5-4.4 

Intermediate I 4.5-5.4 

Intermediate II 5. 5-6.9 

3. Subtest components; 

Primar y Primary Primary Interme- Interme- 
T II III diate I diate H 

Total Reading 
Reading Compre- 
hension X X X a X 
'Word Study Skills ' X X X x 

Total Mathematics 

Cbncepts. X» X X X x . 

Computation and 

Applications X v , , 

Computation X X *, . X X 

Applications ' k X X / X X 

Total Auditory ' 
Vocabulary X ..>X X X x 

Listening Com- f . 5 £ 

prehension . X '*> ;X ; ; XXX 

4. Norminq: Empirical norms are available with a midpoint of 8 
October for grades 2-9. and 8 May for grades 1-9.. and 8 February 

for grades 1 and 2. 

■ # 

' 5. Out-of-level testing: Provides an tfKpanded standard score scale, 

Testing rrore than 'one level out-of-level is not recommended. 

V 

6. Procedures for minimizing bias: Items were edited by a group of 
consultantspwith various minority backgrounds. 

7. Other comments: The scaled score is continuous with Stanford 
Early Schcol Achievement (SESA'T) and Stanford Test of Academic 
Skills (TASK). 
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Test tif Basic Experience II 
(TOBE) 1978 ' 



Languages : English and- Spanish 

■ # 

The Spanish version is a direct translation from the English with 
the exception of items that would radically change in 
translation. . Ln'such cases equivalent items were constructed. 
The Spanish version of the test occasionally provides a choice of 
words so that the most common version of words can be used with 
Mexican, Cuban, and Puerto Rican students. 

Publisher's recommended in-level use: 
Level . Grad e 



K 



Preschool, kindergarten, fall of first grade 



L Spring of kindergarten, first grade 
Subtests ; 

Level Level 

K L 



Ma thematics 
La nguage 



X * X 

" x X 
Science x x 

Social Sciences X X ^ 

Norming: Empirical norms exist only for the English version of 
the test; midpoints are October 19 and April 19. 

Qut-of-level testing: Provides expanded standard ore scales^ 

Procedures for minimising bias: Test items were reviewed by a 
panel of women and minority consultants. The Spanish version of 
the* test was reviewed* by native speakers of Puerto Rican, Cuban, 
and Mexican Spanish. 
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Achievement Test Summaries 

Test information summaries were developed by the Region V Technical 
Assistance Center (TAC) to serve as an information resource for 
evaluatcrs. ' The information included in these summaries focuses 4 on 
the use of norm-referenced tests in Reading, Language .Arts, and 
Mathematics. Test Information Center staff of Region V TAC prepared 
42 summaries (19 are included in this document) in response to 
requests for information about the tests. Preparation of these 
summaries was not intended to imply endorsement or appr ovsl of . any 
test. ' l 

The test summaries, intended to serve as a guide" to the use of'the , 
publishers' test and best publications, may be used for various. ■ 
purposes such as: test familiarization and/or selectTon;. identifying 
the test publications which provide information' on^norming, . 
reliability, and val idi ty ?' selecting appr opr iate test levels for » 
functional level testing; scheduling test administrations >; a.nd 
identifying the publications which contain the required norms tables . 
as well as the names of the specific score conversion tables to be 
u sed . 

Test summaries are revised on a periodic basis as new test information 
becomes available. The following summaries were most recently revised 
in early 1981 and were reviewed by the respective test publishers. 
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Test (Series.)/Year : CALIFORNIA ACHIEVEMENT TESTS, 1970 
Forms : A and B 

Publisher/Distributor ; CTB/McGr aw-Hil 1 
Authors ; E.W. Tiegs and W.W. Clark' 

Description: A series of academic achievement test batteries 
designed for measurement, evaluation, and analyse 

of school achievement. The emphasis -is upon 
. content and objectives in areas of Reading, 

Language and Mathematics. 

Test (Series)/Year : CALIFORNIA ACHIEVEMENT TESTS, 197,-78" 
Forms : C and D 

Publisher/Distributor : CTB McGraw-Hill 

Authors ! CTB/McGraw Hill Test Development Staff 

Description: Achievement test battery designed to measure 

. knowledge of understanding in Reading, 

Mathematics, Language, Spelling, and 
Reference Skills. Levels 10-12 are 
available in Form C only. 

Test (Series)/Year : COMPREHENSIVE TESTS OF BASIC SKILLS, 1968 
Forms : Q and R 

Publisher/Distributor : CTB McGraw-Hill 
Authors : Staff of CTB/McGraw-Hill 

description:' Achievement test battery designed to measure ' 

skills in Reading, Language, Arithmetic, and 

Study Skills'. 



Test (Series)/Year : COMPREHENSIVE TESTS OF BASIC SKILLS, 
~ 3 ; 1973-75 



Forms : S (73) and T (75) % * 
Publisher/Distributor : CTB/McGraw-Hill 
Authors : Staff of CT8/McGraW-Hill 

Description : Achievement test battery designed to measure 
skills prerequisite to studying and learning 
in subject matter courses: Reading, language, ~ 
Mathematics, Reference Skills; Science, and 
Social Studies. 

Test (Series)/Year : GATES -MAC GIN IT IE READING TESTS, 1965-72 
Forms : 1, 2, 3 

Publisher /Distributor : Riverside Publishing Cpmpany, 
~ " " Division of Houghton Mifflin 

Authors : A.I. Gates and W.H. MacGinitie 

Description : A series of tests designed to measure 

group and individual reading achievement' 
from kindergarten through grade 1*2. 

Test (Series)/Year : GA TES -MACGINITIE READING TESTS, 1978 
Forms : 1, 2, 3 

Publisher /Distributor : Riverside Publishing Company, 

Division of Houghton Mifflin 

Author : W.H, MacGinitie 

Description : A series of tests designed to measure reading 
achievement of children in Grades 1 through 12. 
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Test (Series)/Year : IOWA TESTS OF BASIC SKILLS, 1971 
Forms : 5 and 6 

Publ isher/Distributor : Riverside Publishing Company, 
: ~~ Division of Houghton Mirfim 

Authors ; A.N. Hieronymus and E.F. Lindquist 

Description: Provides for comprehensive measurement in 

fundamental skills: Vocabulary, Reading, 

Language, Work-Study Skills (maps ; graphs 
and tables, and references), and . 
Mathematics 

- Test (5eries)/Year : IOWA TESTS OF BASIC SKILLS, 1?78 
Forms : 7 and 8 

Publisher/Distributor: Riverside Publishing Company', 
Division of Houghton Mifflin 

Authors ; A.N. Hieronymus, E.F. Lindquist, and H.D. Hoover 

Desc ription : A new test edition designed to provide 

comprehensive assessment' of student 

achievement in important areas of basic 
skills. Normed concurrently with Tests of 
Achievement and Proficiency, Form T, 1978, 
and the Cognitive Abilities Test, Form 3, 
1978. The expanded standard score of Iibi>, 
1978 is continuous with that of TAP, Form T. 

Test (SeriesVYear ; METROPOLITAN ACHIEVEMENT TESTS, 1970 
Forms: F, G, H 

Publisher/Distributor : Psychological Corporation 

Authors: W.N. Durost, H.H. Bixler, 3.W. Wrightstone, 
G.A. Prescott, and I.H. Balow 

Description: Designed to assess achievement in the 

— important skill and content areas of the 

school curriculum in kindergarten through 

junior high. 
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Test ( Series)/Year : METROPOLITAN ACHIEVEMENT TESTS, 1978, 
INSTRUCTIONAL BATTERY 

Forms : 31 and KI 

Pub Usher /Distributor : Psychological Corporation 

Authors : I.H. Balow, R. Farr, T.P. Hogan and G.A. Prescott 

Description : A nationally normed, criterion-referenced 
test battery in Reading, Mathematics, and 
Language. Each subject area includes ^jp r _ 
learning strands, each of which is - 
represented by a test. Empirical norm- 
referenced scores are available for each 
subtest and for Total Mathematics and 
Total Language. 

T est (Series)/Year : METROPOLITAN ACHIEVEMENT TESTS, 1978, 
: ' SURVEY BATTERY 

Forms : OS and KS 

Publisher/Distributor : * Psychological Corporation 

Authors : I.H. Balow, R. Farr, t.P. Hogan, and G.A. Prescott 

Description : Norm-referenced survey tests in Reading 
Comprehension, Mathematics, v Language, 
Social Studies, and Science. 

Test (Series)/Year : METROPOLITAN READINESS TESTS, 1974-76 
Forms : P and Q 

Publisher/Distributor : Psychological Corporation 
Authors : 3.R. Nurss and M.E. McGauvran 

Description : The MRT is designed to- assess 

readiness to peinq formal learning by 
measuring pre-reading and pre-mathematics 
skills. It is a readiness test battery only 
and has no provision for testing beyond the 
beginning of grade 1. 
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13. Test (Series)/Year : PEABODY INDIVIDUAL ACHIEVEMENT. TEST, 1970 
Forms : One only- 

Publisher/Distributor : American Guidance Service, Inc. 
Authors : L.M. Dunn and F.C. Markwardt, Jr. 

Description : A wide-range, individually administered 
screening test of achievement. 

U. Test (Series)/Year: S TANFORD ACHIEVEMENT J ESTNJ? 73 

Forms: A and 8 , , , 

Publisher/Distributor : Psychological Corporation 

Au thors ; R. Madden, E.F. Gardner, H.C. Rudman, 
B. Karlsen, and 3.C. Merwin 

Description: Academic achievement test battery whose 

content areas include: Reading, Language 

Arts, Mathematics, Science, and Social 
Science. Expanded standard (scaled) 
scores are continuous with Stanford Early 
School Achievement Test (SESAT) and 
Stanford Test of Academic Skills (TASK). 

15- Test (Seri es)/Year ': STANFORD EARLY SCHOOL ACHIEVEMENT TEST. 
' 1967-70. 

Forms : One only 

Publisher/Distributor ; Psychological Corporation 
Authors : R. Madden and E.F. Gardner 

Description: A aroup-administered test designed to 

measure children's cognitive abilities upon 

entrance to kindergarten and during kinder- 
garten and first grade. Expanded standard 
(scaled) score is continuous with Stanford 
Achievement Test, 1973. 



o o 



1 1 1 -69 



16. Test (Series)/Year: TESTS OF ACHIVEMENT AND PROFICIENCY, 1978 



Forms : T 

Publisher/Distributor : Riverside Publishing Company, 
~ ~ ~~ ™ Division of Houghton Mifflin 

Authors : D.P. Scanell, O.M. Haugh, A.H. Schild, and G, Ulmer 

Description : Designed to provide comprehensive appraisal 
of student achievement for widely accepted 
secondary-school goals in basic skills and 
cur ricular areas • Normed concurrently with 
Iowa Test of Basic Skills, Forms S and ;T, 1978, 
to provide extended measuiement in gracjes 
9-12. TAP was also normed concurrently with 
the Cognitive Abilities Test* Form 3. - 

\ 

17. Test ( Seriesj/Year : TESTS OF BASIC EXPERIENCES, 1970-75, 
FIRST EDITION- 

Forms : One only » 
Publisher/Distributor : CTB/McGraw-Hill / 

Authors : M.H. Moss ! 

Description : TOBE measures children's acquisition of 
the concepts and experiences considered 
necessary for participation in the early 
years of school. TOBE has two overlapping 
levels (K and L) which span preschool 
through grade 1. Each level has five tests: 
Language , Mathematics , Science, Social 
Studies, and General Concepts. Each test 
item consists of a verbal stimulus and 
four picture responses . As the examiner 
reads the stimulus aloud, the child makes 
,a mark over the picture (or inside a bubble) 
that he/she believes is the correct response. 



18. Test (Seri es)/Year : TESTS OF BASIC EXPERIENCES, 1978, 
SECOND EDITION 

Forms : Qne only 

Publisher/Distributor : CTB/McGraw-Hill 
Authors: M.H. Moss 
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Description : TOBE 2 measures children's acquisition of 
the concepts and experiences considered 
necessary for participation in the early 
years of* school. TOBE 2, which spans 
preschool through grade 1, has two over- 
lapping levels — K and I. Each level *as 
four tests—Language , Mathematics, Social 
i Studies , and Science. Each test item 
consists of ^a verbal stimulus and four 
picture responses > As the examiner 
reads the stimulus aloud; the child fills 
in an answer space indicating the picture 
that he/she believes is the correct 
response . 



19. Test (Series)/Year : WIDE RANGE ACHIEVEMENT TEST, 1978 
Forms : One « 
Publisher/Distributor : Jastak Associates, Inc. 
Authors : 3.F. Jastak, S.W. Bijou, and S.R. Jastak,, 

Description: A two-level wide range test comprised of 

three subtests: Reading, (recognizing and * 

naming letters and pronouncing words out of 
context); Spelling, (copying marks 
resembling letters, writing the name, and 
writing single words to dictation); and 
Arithmetic (counting, reading number 
symbols, solving oral problems* and 
performing written computations). The 
test, basically a clinical type test, 
consists of one four page test booklet 
which includes both levels. 
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'List of Test Publishers 
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Publisher 

~ ' * • -> 

Addison-Wesley "Testing Service, 
•'-2725 Sand Hill Roed 
Menlo Park, CA 94025 
Telephone: 415-854-030C 

* American, Guidance Service. Inc. 
Rub lishe'r's** Building, 
Circle Pines, MN 55014 . ■ 
Telephone: 612-786-4343 

CTB/McGraw-Hill 
Del Monte Research Park 
Monterey* CA 93940 
Telephone: 408-649-84rO 

Education Progress, Division of 
Educational Development Corporation 
4235 South Memorial 
Tulsa, OK 74145 
Telephone: 918-622-4522 

Jastak Associates, Inc. 
1526 Gilpin Avenue 
Wilmington, DE 19806 
Telephone: 302-652-9990 

Psychological Corporation 
757 Third Avenue 
New York City, NY 10017 
Telephone: 212-888-3500 , 



Riverside Publishing Company 
1919 S. Highland Avenue 
Lombard, IL 60148 
Telephone: 312-629-9700 



Tests 

CIRCUS 

Cooperative Primary Tests * 
Sequential Tests of Educational 
Progress (STEP) 

Key Math Diagnostic Arithmetic Test 
Peabody Individual Achievement Test 
Woodcock Reading Mastery Tests 

California Achievement Tests 
Comprehensive Tests of Basic Skills 
Diagnostic Mathematics: Inventory- 
* Prescriptive Reading Inventory 
Tests of Basic Experiences 

Individualized Criterion 
Referenced Testing 



Wide Range Achievement Test 



Durrell Listening-Reading Series 
Iowa Silent Reading Tests-" 
Metropolitan Achievement Te?ts 
Metropolitan Readiness Test * 
Stanford Achievement Test • : 
Stanfprd Diagnostic Mathematics Test 
Stanford Diagnostic Reading Test * * 
Stanford Early School Achievement Test • 
Stanford Test of Academic Skills (TASK) 

Gates-MacGinitie Reading Tests • \ 

Iowa Tests *of Basic Skills 

Nelson Reading Skills Tests 

Tests of Academic Progress 

Tests of Achievement and Proficiency 
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Science Research Associates, 
Inc. - 

155 North Wacker Drive 
Chicaao, IL 606C6 
Telephone : ,312-984-2195 \ 

Scholastic Testing Service . 
480 Meyer Road 
Bensenville, IL 601C6 
Telephone : 312-766-7150 

Scott ♦ Foresman and Company 
1900 E. Lake Avenue 
Glenview, IL 60025* 
Telephone: . 312-729-30QO 

Teaching Resqrfrces Corporation 
50 Pond Park~Road 
Hingham, MA 02043 
Telephone: 617-749-9461 



Iowa Tests of Educational 

o 

Development 
National Educational Development Tests 
SRA Achievement Series 



§TS Educational^, Development Series 



Comprehensive Assessment Program 
Achievement Series 



Wcrodcock -Johnson Psycho-Educational 
Battery 
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Annotated List of Languag e Proficiency Tests 

This annotated list of eight language proficiency tests is intended to 
provide project directors and evaluators with the information 
necessary to make a well-informed choice in selecting a language 
proficiency .test. The criterion used for including tests in the list 
is the following: -each test is recommended (at the time of printing) 
by a V least one of the three states having the largest number of 
bilingual education programs. 

The tests are primarily in Spanish and English and range from 
kindergarten level to high school. A brief description is offered of 
each test as well as comments on the linguistic and' technical 
properties of the tests. The comments are points that evaluators and 
project directors should be well aware of. in selecting a test or in 
interpreting test results. The comments were drawn from several 
sources including the experience of districts in the bilingual PIP 
field test study, and published articles and critiques. Each 
publisher was given an opportunity to respond to the review and to 
include "Publisher's Comments." This information has been 
incorporated into the reviews. 
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Basic Inventory of Natural Language (BINL) 

Languages : English and Spanish (can be used for other languages) 
What It Tests : Speaking 
Levels and Grades : K-12 

Administration : Individually administered. Requires 10-15 minutes. 

Pictures are used" to elicit natural speech and ten sentences are tag£ 
recorded for later analysis. 

Scoring : Hand or machine scored. 

Interpretation : ' Yields raw scores that can be coverted to one of fouV ' 
levels: NES, LES, FES, PES ("proficient"). 

Age is taken into account in determining levels. 

Comtpents : Pictures are large, attractive, with multicultural content. 
It is difficult to standardize administration procedures since there 
is no set of "items" but rather an elicitation technique. Complex to 
score by hand. Scored on the basis of linguistic complexity and « 

\ length of sentences. These criteria may not always be valid 

*\ indicators of proficiency. 

No information is provided on the validity of the proficiency 
categories. Information on validity is limited to correlations of 
sentence length with complexity, and correlations of complexity scores 
with an oral reading test. Reliability data is limited to 
correlations between the first half and the second half of the test. 
These correlations were njigh. Some districts have found that th> test 
classifies fluent speakers* as "Umited" (see Gilmore and Dickerson, 
• 1979). 

Publisher's Comment : Standardization is facilitated by adequate 
training and close adherence to BINL procedures. Machine scoring 
procedures : reports of five di f ferent types , from classroom listings 
to-district summaries, including pre-post afverages, minimum, maximum 
and average scores by grade levels. A recent study establishes 
averages for grades K-12 based on a sample of 125, 0C0 students. 
Standard error allows for -valid adjustment of scores. The format of 
the test permits retest on invalid tests which have been reported to 
be less than 4% of tests submitted for machine scoring. Percentile 
rank of scores is now included in reports. 
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Bilingual Syntax Measure (BSM) t I 
Languages :. Eng/lish and Spanish 
What It Tests : Speaking 

Levels and Grades ;. Level I, K-2 (ages"* to 9); Level II (/ot\ 
available far review) 

Administration : Individually administered. 
Requires 1G-15 minutes 

Students respond orally to questions based on pictures. 
Scoring ! Hand scored 

Interpretation: Provides language dominance (when both English and 
Spanish tests are administered), level of second language acquisition, 
• and degree of maintenance or loss of the first language* /Assigns 
students to one of five, proficiency levels in each language. 
Additionally, provides instructional suggestions for reading and EbL 
which correspond to each of the five E^g^isJT proficiency. > 

Comments: Attractive, cpl'orful pictures are'" used to elicit speech 
through structured-conversation. Responses are scored strictly on he 
correctness of specific grammatical structures. The choice of 
grammatical structures is based on research studies on the sequence o< 
acquisition of -morphemes. Allpws for regional ^nguagevariatiPn . A 
number of discussions of this test have-, been, published-tflfclud^ng 
Hernandez-Ch. , 1978(1) and . Rosansky ; -1979(2 ) . 

Both test-retest reliability and inter^orer reliability are reported 
in the Technical Handbook. Although the>e*&£ted reliability is low, 
the authors attempt to explain why this is so (TH, p. V). 



(1) Hernandez-Chavez, Eduardo. Critique of a critique: Issues 
in Language Assessment. Journal of th e National Association for 
Bilin gual Education , March 1978, Vol. II, No. 2. 

(2) Rosansky, E.J. A Review of the Bilingual Syntax Measure. 
In: - Advances jn Language Testing , edited by B. Spolsky. Arlington, 
VA: Center for Applied Linguistics, 1979. 
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Comprehensive EnqlislvLanquaqe Test for Speakers of Engli sh as a 
Second Lanauaoe (CELTT 

Language : English ' % ^ 

Wh at It Tests : Listening comprehension, grammar, and vocabulary. 
Contains three subtests: (1) Listening, (2) Structure, and (3) ^ 
Vocabulary.. * 

Levels and Grades : High school, college, and adult. 
Designed for intermediate to advanced ESL students. 
Administration : Group administered. 

Listening requires 4C minutes; Structure requires 45 minutes; 
Vocabulary requires 35 minutes. A recording can be used to administer 
the listening test. 

All test items are multiple choice, students respond tc oral and - 
written stimuli by marking an answer sheet, v , 

Scorina: , ScoTed with a" key. ^ - 

^ 4> • 

Interpretation : Yields percent correct for each^est. 

< •; 

Percentile scores are available (but see\ Comments ) . . 

Does not provide proficiency classifications. No cutoff score is 
provided for classification of students as limited in English 
proficiency, since test was not designed for this purpose. 

• * 

' Comments : Oral production is not tested/ 

All test items on each subtest are multiple choice items that require 
reading; therefore, the measures of listening comprehension , 
structure, and vocabulary are each confounded with literacy skills. . 
The authors recommend the Vocabulary subtest for use with students who 
■ have had advanced training in reading. 

The three subtests had moderate to high internal consistencies with., 
four! groups of foreign students and, therefore, very reasonable 
•starjdard errors of measurement. No information is given on predictive 
validity. Tentative evide/ce of^eoncurrent validity is offered based 
on correlations with o^tfcef standa/d ESL tests. Tentative norms for 
five different groups, based on small samples, are provided. The 
norms are not appropriate for use in most bilingual programs, ^however, 
since the students in the nbrming sample are not similar to most 
students in bilingual programs. 
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Uyin Oral Interview Test 
Languages : English 

What It Tests : Speaking V , 

Levels and Grades ; Secondary and N adult. ■ s v 

Fofms : There are two forms (BI&L .and, JJjMJu arid- each has a longr* version 
(50 items) and a shoxt verslpn (30 items)/ 

. • ■ is • . $\ ♦ * ; 

Administration : Individually administered. Requires' up to 3C 
minutes. . " % ' • 

The students respond to pictorial stimuli and questions by responding 
orally-' I%is are ordered in difficulty and interview is terminated. 

wKen^a frustration* Irevel is reached. '. . . . 

m * 4 . • , ■ ■ 

* ' * * ■ /■ , , 

Scoring : Hand scored. . . 

Interpretation: Yield-raw scores. ' No cutoff score . is given to 
identify students as "limited" in English proficiency; however, 
suggestions are given for placement; levels in adult ESL programs, and 
a range is suggested as the degree of proficiency required for jobs in 
-which'oral communication with -the public is limited. . * . 

Comments : The requirement to answer in a complete sentenoe is an 
unnatura l one and may depress scores of students who. fail to do this. 
The long, version can/Became monotonous since many pictures are • 
repeated. "... 

Internal consistency' 'reliabilities..' are high. No information, is given 
for test-retest reliability or interrater -reliability. Validity 
information is limited to cor.relations..wi th other .tests . and based on 
very small- samp'les.-* 



I. 
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Language Assessment Battery (LAB) 
Languages : * English ,and_ Spanish 

What It Tests : Listening, speaking, reading, and writing. 

Level I has three subtests: (1) Listening and Speaking , (2) Reading, 
and (30 Writing. Levels II and IIP have four subtests: (1) Listening, 
(2) Reading, (3) Writing, and (4) Speaking, 

0 

Levels and Grades ; Level I, gtade^K-2; Level U 9 grades 3-6; Level 
m, grades 7-12. ^ 

Administration : level I; Individually administered, requires 5-10 
minutes . , . » 

Levels II and III: Pf.rt is individually administered; requires 41 
minutes. 

Students respond to verbal* written, and pictorial stimuli by 
pointing, b^" giving owl responses, by writing, and by marking answer 
sheets (on Levels II and III only)* 

Scoring ; Hand scored; parts scored with a key. 

Interpretation : Yields raw 'scores and stinines and percentiles by 
grade. Students Scoring below the 20th percentile may be classified 
as limited in English proficiency. 

Comments : The speaking section of Level I, Test 1, contains only 6 
items, all of" which may be answered with one word. The writing tests 
measure reading skills in addition. to writing skills. 

The test went through all the stages of preparation by expert and 
experienced item writers, pilot studies, item- and test-analyses, and 
.normifig on substantial samples (20 schools, and about 500. students at 
each level from K thrbugh 12). The technical manual is a model. 

Onestudy(l) has shown that the Level I English test do6s not 
discriminate well in 'the range near the cutoff point for classifying 
students as limitedin English. This reduces its value for^use as a 
pre-post measure. • i N 



(1) Hubert, 3. "An Investigation of. the Language Assessment 
Battery (English, Level 1) for Title VII students in Hartford. " 
Unpublished manuscript, 1978. 
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Language Assessment Scales (LAS) 
Languages : English and Spanish 

What It Tests: Listening comprehension and speaking. Five subtests 
form the tota l score for both levels: (1) discrimination of minimal, 
phonemic pairs, (2) vocabulary" production, (3) phoneme production, (4) 
syntax comprehension, and 85) story production. 

Levels an d Grades : Level I. grades K-5. Level II, grades 6-12. 

V - ' 

Administration ; Individually administered. 
/Requires. 20 minutes. 

Stimuli consists of tape recorded speebh and pictures. Students 
respond' orally, and by pointing. * 

Scoring ; Hand scored. . * 

Interrater reliability should ^be obtained on storytelling task. 
Age is taken into account, in scoring. 

Interpretation ; Yields a score of 1 to 100 which can be converted to 
a level. 1 to 5. 

Students who score at level 3 or below are classified as "Limited 
English (or Spanish) speakers." 3 

Comments: This is a fairly comprehensive overall aural-oral 
proficie ncy test. There are problems with the phonemic discrimination 
section since this task requires a kind of metalinguistic awareness 
students may not have. The story retelling task measures not only 
production, but also comprehension. 

Interrater reliability coefficients for the story retelling task are 
moderately high. Coefficients of internal item consistency for 
discrete-point items range from .36 to .96. 

Validation consisted of one-way analyses of variance of relatively 
small samples (one- to two hundred) of students dichotomized into 
English-dominant and Spanish-dominant on the basis of teacher 
judgment. r 

t 

Several studies of reliability were done on small samples (21 "English 
and 35 Spanish) usino var ious- approaches . The sample sizes were too 
small to justify some of the analyses and the conclusions drawn from 
them . 
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Primary Acquisition of Language (PAL) Oral Language dominance Measure 
(OLDM) Oral Language Proficiency Measure (-QLPMT ~ 

Languages : English and Spanish \ 

What It Tests: Listening comprehension and speaking 

- - — ■ — — » * * 

. Levels and Grades : PAL OLDM. K~3; OLPM. 4-6 
Administration. : Individually administered. 
Requires 15 minutes for each language. 
Students respond orally to oral and pictorial stimuli. 
Scoring : Hand scored . 

Interpretation : Yields raw scores ( "G scores") that are converted to 
proficiency levels. 1 to 5. Also yields dominance categories. 

Students whg score at level 4 or below are classified as "Limited 
English (or Spanish) speakers." 

Comments : Simple to use and stxire. Scored on the basis of 
grammatically and appropriatenesfc\of responses a well as quantity of 
speech. 

The test was developed "as a result of research by the El Paso Public 
Schools." 

Item analyses were used in the construction of the tests although 
samples were somewhat small (about 20C drawn from three grades in high 
schools). Validity is quoted in terms of the test's ability to grgde 
schools in co'rrect order* and of correlations with a reading test. 
The latter were fair being around 0.3 to 0.5. 
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Shurtt Primary Language Indicator T est (SPLIT) 

Languages : English- and Spanish . ♦ 

What It Tests : Listening comprehension, speaking, reading, and - ; , 
• grammar. 

Thpre are three subtests: (1) Listening Comprehension, (2') Verbal 
Fluency, and (3) Reading Comprehension and Grammar. ^ 

Levels and Grades : ' Listening Comprehension, Verbal Fluency, K-65 
Reading Comprehension and Grammar, 3-6. 

A dministration : Listening Comprehension: Group administered; 
Requires 35 minutes, tape recording available. 

Verbal Fluency: Individually administered.; requires 15 minutes. 

Reading Comprehension and Grammar: Group administered; requires 30 
minutes. , , 

making p!ctu?es in'answer book, or by marking' an answefsheet . 
Scoring: Hand scored; parts scored with a key. 

Interpretation : Yields raw scores, percentile ranks, and age and 
grade equivalents. 

Yields a dominance classification. 

Comments: Yields no cutoff point to classify students as limited in 

English p roficiency (independent of Spanish/Portuguese score). A 

deficiency classification is given based on the dominance 

P i L This wrongly assumes that students are ( highly 

I 0? n f n'he^inanrianguage. A student ( whose English score is 

llrl low can be classified as "English Adequate" if the student s 

Soanish sco^e is also very low, but higher than the English score. 

Kicts snouio establish their own cutoff .points for classifying 

students in English. <S> 

Grade equivalent scores should not be used. 
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MEASURING SELF-CONCEPT 

Research investigating the relationship between ethnicity and student 
self-concept is mixed and inconclusive. The mixed findings may be 
"attributed to many factors, including the fact that different 
researchers have measured different dimensions of self-concept and 
compared them as if they were- the same. The end result is that the 
relationship between ethnicity and self-concept is still vague and 
needs more careful study. " * 

9 » 

The concept of self is basically derived from'(l) the responses made' 
toward the' individual by significant people in his immediate 
.environment, (2) his perception of their behavior 'towards him., (3) the 
internalization of his perception into a coherent set of self-views, 
(4) the resultant self which he .perceives as reflected back into the 
eyes o"f others, (5) the reinforcement of that self as seen by him and 
by others, as well as by his View of* tlfeir. concepts of him and 
(6) his responses to the challenges and pressures of living. 

The need to address cultural differences as a factor when measuring 
the self-concept of minority students was addressed by Whiting (1974) 
when he developed a .series of self-conce'pt measures. In stressing the 
need for culturally sensitive instruments, Whiting pointed out that 
tests should be designed with a particular population in mind, taking* 
into account that population's values and 'concerns in order to 



TAdloted with author's permission from: Ratiiff, Stanley. Working 
Papers from the Bueno Center for Bilingual/Multicultural Education, 
School., of Education, University of Colorado at Boulder. 
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measure self-concept more accurately. Whiting goes on to describe a 
battery of self-concept instruments developed around the 



multi-dimensionality of self-CQncept. 
the following manner: 

o Self-esteem refers to how an 
himself and indicates the 



He described sel f -concept in 



individual 
extent to 



evaluates 
which he 



capable, 



significant , 
an 



believes .himself to be 

successful, and worthy. Within the" context 
this definition, the individual arrives at 
evaluation of his own worthiness by examining his* 
performance, capacities, and attributes in light 
of his own personal standards and values. Tbus, 
self-esteem is a "personal judgement of worthiness 
that is expressed in the attitude? the individual 
ho^ds toward himself." i 

Sense of control refers to how much^an individual 
accepts responsibility for his own actions, or 
whether he attributes power and control to various 
external agents, such, as adults; peer, luck, 
brothers or sisters - the "system," or fate^ If a 
child has little sense of'control, he,also has 
little sense oT responsibility, since the two are 
5G closely related. 

Academic self-concept refers to how an individual 
evaluated his ability to functjion succesfully in a 
school environment . 

Social self-concept refers to how an individual 
thinks the people who are significant* in his life 
perceive him. t * . 



Special Problems of Measuring Self-Concept ofACulturally Different 
Children 



The measuring of self-concept and attitudes is complicated by the need 
to consider the distinct cultural background of many of the students 
participating in the program. Thus, a published instrument with 
acceptable levels of validity and reliability may not be appropriate 

3h ; 
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nh nve the best means for measuring desired 
■ Because of the issues above, the 

. „ ih he' 1) ore- and poottesting with published 
Change in self-concept would be 1) pre P 

• ; 1a r to the stuqents in the 
te8t s normed on minority groups si.ilsr 



bilingual program and (2) to use informal, teacher-developed formative 
measures using a variety of approaches, including paper and pencil and 
picture instruments, teacher observation guides, and parent' 
questionnaires, 

•J 

The reason for monitoring changes in self-concept centers around the 
relationship of positive changes in self-concept and success in 
school, as well as success in social relationships. Since it is 
important to obtain an accurate assessment of what is happening to a 
student's self-concept while he is a participant in the program, it is 
necessary to take readings on self-concept fairly often and in a 
variety of ways. An adequate s>stem for measuring self-concept would 
include teacher-made tests, as well as published instruments. Picture 
te3ts for non-readers, as well as paper and pencil tests for older 
children, should be a part of the testing program. Projective 
techniques Where students determine concurrence between self-ratings 
and the ratings of others are powerful means of helping students 
become aware of- themselves in terms of how "others see me." However, 
certain cautions should be taken when utilizing s^lf-report ing 
instruments. Results should be kept confidential, instruments should 
be administered in a non-threatening manner, testers should point out 
that there are no right or wrong answers and testers should read items 
to very young students. 

Teacher-Developed Self-Concept Instruments 

Teacher-developed instruments offer several advantages in that (1) 
they may include items unique to the community, program, school or 
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classroom, (2) the instrument may include items that take into account 
any cultural variables and (3) teachers will b.e more positive toward 
instruments that are locally developed. * 

.Among the behavioral indicators a teacher might look for would be: _ 



o 
o 



How does the student react to a new situation? 
How does the student react to new material? 

o Does he trust, his teacher (especially when new in 
class)? 

o Is he cooperative and does he follow directions 
reasonably well? 

o Does he control his own behavior? 

o Does he have his own ideas? * 

o Does he talk freely about his ideas? 



o Does he operate on his own with a minumum of 
direction from the teacher? 

o Is he generally a happy person? 

The preceding are only suggestions, but if adapted, they will serve as 
the basis for developing an observation guide to assess self -concept. 



Published Self-Concept Instruments 

Published instruments, with acceptable levels of reliability and 
validity, are an essential element in evaluating the self-concept of 
students in the program. However, because the students are sometimes 
not as mature as test publishers would like them to be, self-reporting 
instruments may lack desirable levels of reliability and validity. 



This is especially true when the instruments are used with very young 
children. Nevertheless, they do provide insights into student 
behavior, especially ^when used in conjunction with other measuring 
instruments. 

Samples of published instruments should be examined, if possible prior 
to their purchase. If not, there are several reference works 
available which provide descriptions of self-concept instruments. 

Among them are R.C. Wylie's The Self Concept: A Review of 

Methodological Considerations and Meas uring Instruments. Revised 
edition. Lincoln University of Nebraska Press 1974; or R. Shavelson's 
"Self Concept: Validation of Construct Considerations" Review of 
Educational Research 46 (1976) pp. 407-441. 

Due to the built-in difficulties of showing positive growth, a word of 
caution concerning pre- and posttest timing of self-concept 
instruments is needed. In order to avoid pretesting students that are 
already in the program and considering the fact that self-concepts are 
relatively stable, testing should be done as early as possible., 
Subsequent posttesting should be spaced as far from the pretesting as 
is feasible. 

The message for the evaluator is clear. Unless self-concept measures 
are carefully developed to reflect the unique characteristics of the 
students' in the program and carefully selected from reputable 
publishers, there is a distinct possibility that, in spite of a 



positive effort to show positive changes in eelf-concept as 
of the project, negative changes actually result. 

» U at of publiahed teats to maaaure aelf-concepta >e attached. 
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Published Self-Concept Scales 

1. Self Concept Picture Inventory (Wiseman & Adams) 

The Self Concept Picture Inventory was designed to 
evaluate grades one to three of Title I programs in 
Alton, Illinois. The test is appropriate for younger 
students and is relatively free of racial and sex 
biases . 

Wiseman, E.D. & Adams, ^.H. "Self Concept Picture 
Inventory". Atton, 111. 1972. (see ERIC ED 170-299) 

2 . The Florida Key; A Scale to Infer Learne r Self Concept 
(Purkey; 

The Fi--ida Key is a learner self-concept scale that, 
with adaptations, may be used with students of all 
ages. The scale is designed to aid teachers in 
■ evaluating students' self-concepts as learners, as well 
as attitudes toward school. 

Purkey, W.W. et. al . , "The Florida Key: A Scale 5 to 
Infer Learner Self Concept." E du cat ions 1 and 
Psychological Measurement . 33 (1973) pp. 979-*ti4. 

3. Thomas Self-Concept Values Tes t (Thomas) 

The Thomas Self-Concept Values Test measures fourteen 
self-value dimensions, such as sociability, ability, 
attractiveness, and independence. The 14-item test is 
designed to be used with young children from H to 6 
years old. However, some caution should be exercised 
in interpreting test results given the problems of 
self-concept measurement in young children. 

Thomas, W.L., The Thomas Self Concept Values Jest. 
Combined Motivation Education Systems, Inc., I9i>9. 
Rosemont, Illinois. 

4. Self-Esteem Inventory (Coppersmith) 

This 58-item scale was designed to measure general 
self, spcial self-peers, home/parents, and school 
academic self in addition to self-esteem. It "worded 
to be used with childr-en from 8 to 10 years old, but 
has been used successfully with students m grades 
three through twelve. f 

Coopersmith, S., The Antescendents of Se lf-Es t eem. San 
Francisco, California, W.H. Freeman & Co., 1^67 
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5. Self-Concept of Ability Scale (8rookover) 

The eight items contained in Form A are designed to 
measure self-concept of general academic ability; and 
the eight items in Form B are designed to measure 
self-perceptions of ability regarding science, 
mathematics, social studies, and English. The scale is 
most suitable for use with high school aged students. 

Brookover, W.B. et. al., Relationship of Self-Concept 
to Achievement in High School , 1967. Michigan State 
University , Lansing, Michigan. 

6. Piers-Harris Children's Self-Concept Scale (Piers and 
Harris) \ ~ ~ 

The 80-item instrument measures general self-concept 
and may be used for both research and diagnostic work. 
The simple descriptive statements are designed to 
measure ten self-concept dimensions and the scale is 
appropriate for use in grades three or above. 

Piers, E.V. i Harris, D.B. Piers-Harris Children's Self 
Concept Scale , Nashville , Tenn. Counselor Recording 
And Tests, 1969. 

7. How I See Myself Scale (Gordon) 

The How I See Myslf Scale consists of 40" ( elementary 
form) or 42 (secondary form) items developed for use 
with children ages- 3 to 12 years. The scale has been 
found to measure five self-concept dimensions: physical 
appearance, interpersonal, teacher-student, academic 
ability, and autonomy. 

Gordon, 1.3. A Test Manual for the How I See Myself 
Scale » Florida Research and Development Council,^ 
Tampa, Florida. 

8. About Me by James Parker; Not Dated; Grades 4-6; James 
Parker*. 

A five-point self-rating scale assessing five areas of 
self-concept which are* expressed in behavior in the 
school setting. Subscores included are: Self, Self in 
Relation to Others, Self as Achieving, Self in School, 
and the Physical Self. 

♦Included in Parker, James. The Relationship of 
Self-Report to Inferred Self-Concept , Educational and 
Psychological Measurement, 26 pp. 691-700; 1966. 
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9. The Behavior Cards; A Test-Interview for 



10. 



11. 



Delinquent 
3-10; 



Children by Ralph M. Stogdill; 01941-50? Grades 
Stoelting Company. 

Use of the Cards provides the child with an opportunity 
to face his problems and provides an insight into the 
child's attitudes toward his delinquent behavor. 
test is" individually, administered, employing 
card-sort technique. Any child. who scores grade 
higher on a standardized reading test should 
sort the cards with little assistance 



The 
the 
4.5 or 
be able to 
Cards can be 



read to subjects with reading disabilities. At times 
an abbreviated version of the test can be given by 
eliminating fifty specified cards. This eliminates 
the more serious delinquent behaviors. 

Beh avior Rating Form by Stanley Coopersmith; Not Dated; 
Grades Kindergarten-9; ^Stanley Co'opersmith*. 

A 13-item five-point rating sc ale • devised for 
appraising assured and confident behavior. Items refer 
to such behavior as the child's reaction to failure, 
self-confidence in a new situation, sociability with 
peers, and the need for encouragement and reassurance 
The form yields two scores: Esteem Behavior and 
Confidence Behavior. 

♦Data is available in: Coopersmith, Stanley; 
Antecedents of Self-Esteem ; San : Francisco ; W.H. 
Freeman, 1967. 



Form 
3-6; 



II by Marjorie 
Marjorie B. 



Children's Self-Concep tion Test: 
B. Creelman; cl954-55, tirades 
Creelman. 

Designed to 'assess the relationship of self-concept to 
adjustment or maladjustment. Employs a series o 
clotures depicting situations commonly experienced by 
children. in P Western culture. Test provides indications 
of self-esteem and moral standards. 

12- Children's Self-Social Constructs Test: Primary Form 
by Edmund H. Hendersen, Barbara H. Long, and Robert C. 
Ziller; 1967; Grades 1-6; Edmund H. Hendersen.* 

A measure of social self ^concept from which certain 
asoects of the child's conceptions of himself are 
inferred? Subscores include: Self Est Social 
Interest or Dependency, I d e n 1 1 f i c a t : on , Group 
d e n t i f i cation, Individuation or c ^olVAtJ 
Identification, Power, Egocentr icity , Complexity, 
Realism for Size, and Preference fo. Others. 
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♦Listed as available from Edmund H. Henderson, may now 
be obtained from The Office of Special Tests, 
Educational Testing Service, 17 Executive Park Drive, 
NW, Suite 100, Atlanta, Georgia 30329. 

13. Columbus Sentence Completion for Children by Oack A. 
Shaffer and Arthur S . Tamkin; Not Dated; Ages 
4-Adblescence ; Oack: A. Shaffer. 

A general projective test covering the following 
topics: Self-Concept, Wishes and Plans, Self -Concept 
(Pr.oblems)* Family, Social, School, and Picture of 
Self. The test provides an indication of the child's 
adjustment level. 

14. Coppersmith Self-Esteem Inventory: Form A by Stanley 
Coopersmith; Not Dated; Ages 9-Adults*; Stanley 
Coopersmith. 

Designed to provide a general assessment of 
self-esteem. The 58 items are arranged into five 
subscales: General Self, Social Self-Peers, 
Home-Parents, Lie Scale, and Home Academic. 

*Can be used with children younger than age 9 if 
individually administered. Technical information is 
available in: Coopersmith, Stanley. Antecedents of 
Self-Esteem ; San Francisco; W.H. Freeman, 1967. 

15. Coopersmith Self-Esteem Inventory: Form B (Short Form) 
by Stanley Coopersmith; Not Dated; Ages 9-Adults; 
Stanley Coopersmith*. 

Designed to measure self-esteem from the perspective of 
the subject, Emphasis is placed on the subject's 
se If- attitudes in four areas: pe^rs, parents , school 
and personal interest . 

♦Additional information is available in: Coopersmith, 
Stanley , Antecedents of Self- Esteem ; San Francisco; 
W.H. Freeman, 1967. 

16. Expanded Test Anxiety Scale for Children (Feld and 
Lew"is 1969) by Sheila, ,C. Feld and Judith Lewis; 1969; 
Grades 1-9; Sheila C. Feld*. 

A modification of the Sarason Test Anxiety Scale for 
Children which includes th.e original and revised 
questions and two neutral items about dreams and 
achievement. Subscales include: Test Anxiety, Remote 
School Concern, Poor Self-Evaluation, and Somatic Signs 
of Anxiety. 
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♦Included in Feld, S 
Achievement Anxieties in 
(Ed.). Achievement Related 
York; Russell Sage Foundation, 



20. 



21, 



and Lewis, 3. "The Assessment of 
Children." In CP. Smith 
Motives in Children. New 
1969, pp. 151-199. 



17. How I See- Myself Scale: Elementary 



Form by Ira 3. 
(Manual is 
and 



Ira 3. Gordon 
Educational Research 



Gordon; 1968; Grades 3-6; 
available from the Florida 
Development Council). 

Factors assessed are T e a c h e r - S c h oo 1 , Physical 
Appearance, Interpersonal Adequacy, and Academic 
Adequacy. 



18. How Much Like Me ? by Dale 
Grades 3-5; Dale W. Dysinger. 



W. Dysinger; Not Dated; 



22. 



Experimental Form by 
Grades 1 and above. 
12031 Wilshire Blvd., 



A self-administered measure of general self-concept. 

19. TnfPrr Prt Self-Concept Judgment Scale by Elizabeth 
McDaniel; 1965-69; Grades 1-9; Elizabetn McDamel. 

Designed to measure the student's self-concept as it is 
generated and in the school setting. 

Inferred Self-Concept Scale: 
Elizabeth L « McDaniel; 1969 ; 
Western Psychological Services, 
Los Angeles, California. 

Scale is based upon the assumption that self-concept 
can be inferred from manifest behavior. Scale purports 
to be appropriate for assessing and comparing 
self-concepts of culturally different groups. Test may 
also be used with adults and juveniles. 

Instr uctional Objectives E xchange: Measures of 
^mFIrnnrpph. Kindergarten-G rade 13, Revised Edition; 
1972; Grades Kindergarten-12; Instructional Objectives 
Exchange. 

A series of affective objectives concerning the 
learner's self concept. Dimensions employed are peer, 
scholastic, family, and.general, ^lf-r.epor 
inventories (direct and indirect) and observational 
inventories are provided to assess the attainment of 
each objective. 



I n s t r u c t i o n a l 
Co' 



O bjectives Exchange: 
d e Tow ar d 



Objective 



1-1 e c t i o n_iH_A ttitu 

t ~nH ev~a a fte n~- G r a d e 12 , Revised Edition: 1972; 
Kindergarten-12; Instructional Objectives Exchange 



S C h ££ lj_ 

Grades 
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A collection of affective objectives dealing with the 
learner's self-concept as reflected in attitudes toward 
teacher, school subjects, learning, peers, social 
structure and climate, and general attitudes. An 
observational indicator and both direct and inferential 
self-report measures are provided to assess the 
attainment of each objective* 

23. Morgan Punishment-Situation Index by Patricia K. ( Moran; 
Not Dated (Test is copyrighted); Ages Children 9-12 and 
their mochers; Eugene L. Uaier. 

A projective device specifically concerned with the 
perception of the direction of aggression in the 
punishment situation. The Index yields four concepts 
operating in the punishment situation: the child's 
self-concept, his concept of his mother, the mother's 
self-concept, and her concept of the chi>ld. Employs 
scoring procedures developed for R6*senzweig 
Picture-Frustration Test. 

24. Rogers' Personal Adjustment Inventory by Carl R. 
Rogers; 1961; Ages 9-13; Western Psychological 
Services. 

Designed to, assess a child's attitude toward himself, 
his family, and his peers. Subscores include: 
Personal Inferiqrity, Social Maladjustment, Family 
Maladjustment, and Daydreaming. : ' 

25. Sears Self-Concept Inventory: Abbreviated. Form by 
Pauline S. Sears; 1966; Grades 3-6; ^Pauline S. Sears. 

The child rates himself in terms of: Physical Ability, 
Attractive Appearance, Convergent Mental Ability, 
Social Relations with Same Sex, 'Social Virtues, 
Divergent Mental Ability, Work Habits, Happy Qualities 
and School Subjects. 

26. Self -Concept Adjective, Checklist by Alan 3. Politte; c 
1971; Grades K i n d e rg ar t en -8 ; Psychologists and 
Educators , Inc. 

Enables the student to project bis personal feelings 
related to sel f-concept "phenomena and provides indices 
of his general levels of self-concept feelings. The 
adjectives cover the following: Physical Traits, ^ 
Social Values, Intellectual Abilities, and 
Miscellaneous (emotional feelings, group behaviors, and 
habits); As a result of the scoring, the child is 
identified as "self-confident," poor self-concept, or 
"aggressive." 
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27 • Self-Concept and Motivat ion Inventory; Later 
Elementary Form by-George A. Farrah; cl968; Grades 3-6; 
Person-O-Metrics. 

Measures academic self-concept in terms of the child's 
perception of his role as a learner. The inventory 
yields scores "for role expectations, sel f -adequacy , 
goal and achievement needs, and failure avoidance. 

28. Self-Concept" As A Learner Scale-Elementary by John.- K. 
Fisher; Not Dated; Grades 3-6; John K. Fisher. 

The SCALE is a modification of the secondary scale 
' developed by Walter B . Waetjen. Subscores include:. 
Motivation, Task Orientation, Problem Solving, and 
Class Membership. The Motivation factor is designed to 
determine the degree to which the respondent perceives 
himself motivated to do school work and to participate 
in learning activities. Task Orientation refers to the 
way a student sees himself relating to learning 
activities. Prob}em Solving determines the view that a 
pupil has of himself as a problem solver. The Class 
Membership factor is designed to find out how the 
student sees himself in relation to other members of 
the class. 

29. Self-Concept Instrument-A Learner Scale by Gordon P. 
Liddle; 1967; Grades. 3-6; Gordon P. Liddle. 

Variables assessed are self-concept in reference to 
.'motivation, intellectual ability, task orientation, and 
class membership. 

30. Self-Concept of Ability Scale ; 1963-68; Grades 2-6; 
University of Maryland Research and Demonstration 
Center of the Interpersonal Research Commission on 
Pupil Personnel Services. 

Designed to assess change in self-reported attitudes of 
groups of students toward themselves as learners. 
Covers six academic content areas: arithmetic, 
English, social studies, science, music, .and art . The 
bases of comparison are the class, the grade level, 
close friends, future high school class, future college 
associates, other students in general, and one's own 
ability. The scale was adapted from Brookover, 
Peterson, Thomas' Self-Concept of Ability. 
. m- 

31 . Self-Concept Target Game by Ann Fitz-Gibbon; 1970; Ages 
9-10; Ann Fitz-Gibbon. 

Designed for use with children who have participated in 
the Responsive Model Follow, Through Program. It is a 
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measure of sel f -concept in terms of the child's 
willingness to take reasonable risks of failure, make 
positive estimates of his ability to perform a task, 
make realistic statements about the probability of 
being right or wrong, learn from errors and 
corrections, use -failure £n a productive manner, and 
take credit for accomplishments and acknowledge 
failure, individually administered. 

Self Profile Q-5ort by Alan 3. Politte; cl970; Grades 
3-8; Psychologists and Educators, Inc. 

Aids in elementary school counseling by providing a 
means for eliciting self-evaluation from a student, for 
investigating changes in a student's self-concept 
through the course of counseling sessions, and for 
stimulating group interaction in the counseling 
setting. 

A Semantic Differential for Measurement of Global and 
Specific "Self-Concepts by Lois Stillwell;- Not Dated; 
Grades 1-3 and 4-6; Lois Stillwell. 

Test can be modified to assess attitudes towards self 
in a variety of specific roles or conception of self 
from the point of view of a stated referent. The 
Primary Form is appropriate for Grades one through 
three and the Upper Grades Form is for the fourth grade 
and beyond. Test can be group administered easily to 
those in grade three or higher. First and second 
graders may have difficulty and will require several 
assistants to provide close observation. Subscores 
include: Myself, Myself As a Student, Myself As a 
.Reader, Myself As an Arithmetic Student. 

Tennessee* Self-Concept Scale: Clinical and Research 
Form by William H. Fitts; cl964-70; Ages 12 and Above; 
Counselor Recordings and Tests. 

Yields 30 profiled scores: Self Criticism, Self Esteem 
(Identify, Self-Sat isf action , Behavior, Physical Self , 
Moral-Ethical Self, Personal Self, Family Self^ocial 
Self, Total), Variability of Response (Variation across 
First Three Self-Esteem Scores, Variation across Last 
Five Self-Esteem Scores, Total), Distribution, Time, 
Response Bias,xNet Conflict, Total Conflict, Empirical 
(Defensive Positive, General Maladjustment, Psychosis, 
Personality Disorder, Neurosis, Personality 
Integration), Deviant Signs, and five scores consisting 
of counts of each type of response made. 

Tennessee Self-Concept ' Scale : Counseling Form by 
William H. Fitts; cl965-70; Ages 12 and Above; 
Counselor Recordings and Tests. 
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Yields 15 profiled scores: Self-Criticism, Self-Esteem 
(Identity), Self-Satisfaction , Behavior, Physical Self, 
Moral-Ethical Self, Personal Self, Family Self, Social 
Self, Total, Variability of Responses (Variation across 
First Three Self-Esteem Scores, Variation across Last 
Five Self-Esteem Scores, Total), Distribution, and 
• Time. 

36. What I Am Like ; Not Dated; Grades 4-10; Cincinnati 
Public Schools, Division of Psychological Servicos and 
Division of Program Development. 

A five-point, bi-polar, self-rating scale based on 
Osgood's concept of the semantic differential. 
Subjects are: What I Look Like, What I Am Like When I 
Am With My Friends, and Wha.t I Am. The test is for 
research only and is to be used only ir group 
assessment. 

37. When Do I Smi&e ? by Dale W. Dysinger; Not Dated; Grades' 
1-5; American Institutes for Research. 

Variable assessed in self-concept in reference to the 
school setting. 
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ETHNOGRAPHIC METHODS OF PROGRAM DESCRIPTION 



Introduction 

This document contains a discussion relative to the collection of data 
and eventual development of a program description. The ethnographic 
procedures outlined fulfill the basic requirements of the program 
description needed for a local school district -bilingual education 
evaluation report. Use of ethnographic methods for program 
description are not common in biling.ual education evaluations. 
However, such procedures have long been suggested for use in examining 
the. efficacy of bilingual education programs. These procedures are 
relatively easy to implement and require a low expenditure of time and 
energy. Furthermore, they necessitate minimal skill development on 
the part of the evaluator and/or other data collectors. 

While necessary components of program evaluations, psychometric and 
quantitative descriptions of bilingual education programs are limited 
in the breadth of useful information provided. The demographic 
numbers, the enumeration of staff rolls, the description of physical 
features, and th e ,i ndi cat i ons of time allocations for various 
instructional components, etc. are all included as necessary parts of 
most evaluation reports of school programs. This information, 
however, seldom provides all the necessary useful insights needed to 
comprehend the actual process of schooling that occurs in the 
educational program evaluated. Very often, nonmeasur eable aspects of 
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schooling such as classroom climate, teacher's attitude, instructional 
interactions, etc* are ignored. Yet this subjective characterization 
can provide an account of the features of a program that may well be 
responsible for its success or failure in meeting educational goals. 
This document proposes an addition to the traditional information 
included in evaluation reports; namely: Ethnography of classrooms and 
program. 

* 

School programs are complex. They change over time. They do not 
always conform to £ priori theorizing or standardization of procedure 
or to gaals that have been previously set. They cannot be 
characterized as consisting of isolated and discrete occurrences of 
phenomena each having meaning only in a strictly definecL contained, 
or denotative sense. All aspects of a program are related to each 
other and to the participants taking part in the program. Because the 
programs are not homeostatic, with variables that can be isolated, 
each with a singula^ independent effect, they do not lend themselves 
to manipulation by evaluators. The process of schooling and indeed of 
bilingual education has multiple realities and as such, events must be. 
understood from th<* perspective of the total program. 

Quantitative evaluations and their program descriptions attempt to 
discover, verify, or identify causal relationships among concepts 
'derived from a theoretical scheme that may or may not reflect the 
reality of the program. Frequently, interrelationships among the 
various aspects of the program may not be clear. As a^result, 
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replication of the program may not be possible. At times, a program 
description in purely quantitative evaluations represents an 
••outsider' 3" cursory comprehension of a program's operation that lacks 
verification by the participants 'as to its accuracy and explanatory 
input from ''insiders" as to its content. Qualitative descriptions, 
such as are suggested |> this document, can reflect both what the 
-insiders'' (the managers and the teachers) believe is occurring as 
well as represent what is actually seen to occur by the evaluator 
("outsider"). 

This document will be organized into four parts: Definition of 
ethnographic methods, types. of information to acquire, data collection 
and instrumentation, and information usage and reporting. 

Ethnography 

,** 

Ethnographies attempt to accurately describe what is occurring in a 
given situation. They define or redefine reality. In ethnography, 
the evaluator attempts to understand what is happening in that 
setting, how it is occurring, how the participants view that 
occurrence, and how members of various groups participate within and 
across these occurrences. With an ethnography, the evaluator does not 
judge what occurs .as either good or bad, as effective or ineffective; 
rather, ethnographies describe the relevant information in a situation 
and examine the pecurrent 'patterns of behaviors. . From this 
characterizatior>Yhe ethnographer defines the rules and processes for 
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the participants and for participation in the context described* The 
process of constructing an ethnography involves the development of a 
picture that explains the reality observed. Ethnographies, then, 
involve a three-step process: First, data is collected that' helps 
describe what is occurring; second, a typology or model is developed 
that reflects these occurrences; and third, the validity of the model 
is tested and implications are drawn. 

Ethnographies add to the information usually obtained by other 
traditional methodologies of program description in the following 
ways: — . 

1. Ethnographies are concerned with the culture of 
the situation observed* 

2. Ethnographies necessitate direct, on-site 
observations occurring over time and at times 
necessitate participation by the evaluator in the 
activities taking place. 

3. The Instruments used are field-based and are used 
to determine reality. 

* 

4. Ethnographies are holistic and characterize how 
various 'parts of a programmatic puzzle fit 
together . 

5. Patterns and hypotheses developed result from an 
immersidn in the field by determining what 
actually occurs in the field and not from 
predetermined theorizing of what should occur. 

Et hn'ogrVptvTes are not brief or selected samplings. Rather, they 
involve complete descriptions of the interrelationships of recurring 
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variables in a society under specified conditions as they affect or 
produce ce-rtain results and outcomes in that society. As such, 
ethnographies add a needed dimension' to evaluative program 
descriptions. > 

Types of Information to Obtain 

In addition to providing students with access to the skills and 
knowledge expected from traditional education of monolingual children, 
bilingual education programs are by definition different from 
traditional school programs in two important ways. First, two 

i 

languages are used as media of instruction and language development is 
an integra-1 part of the educational program. Secondly, the culture of 
the children (including attention to self-concept, learning styles, 
motivational styles, etc.) is an important consideration in the 
process of schooling. As a -Result, it is important in the program 
description section of the evaluation report to characterize the 
bilingual classroom and/or program in each of these three important 
areas: Use of language(s), incorporation of culture, and the 
instructional focus. 

Since t^e quantity of i n f o rma t i o n .t ha t can be otained about a 
particular classroom .and/or program is very great, it may be necessary 
to develop and keep in mind organizers and categories of the kind of 
data needed. In each of the following three subsections, some salient 
"questions for and suggested areas of data collection are provided as a 
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general guide to data collection. The suggestions included in each 
subsection are not intended to be all inclusive or totally complete, 
fhe evaluator/ethnographer, in consultation with the program manager, 
should be the best judge of what information should be obtained in 
order "to adequately and accurately depict the processes operant in the? 8 
bilingual program. 

Language A description of how participants use language to 
communicate information in bilingual school settings, hew they 
influence and persuade others, how they negotiate using language, etc. 
is necessary if one is to understand the bilingual classroom. This 
subsection includes suggested information that can be useful in 
describing language used in a given and specific situation. 
Additionally, this subsection provides general guidelines for 
information procurement that can characterize the use of language in 
bilingual classrooms and/or programs. 

A language-use mapping technique (Green and Wallat, 1981)* has been 
suggested for describing the language used by both teachers and 
students in a bilingual classroom. Ccxding of information that can be 
gathered in a given language-use situation can be as follows: 



-•-Green-, —3 .-L . -and-Wa-l-l-at,--C.-- "Mapping-Instruc-tional-ConveEsations=^A_ 
Sociolingui stic Ethnography." In: Ethnograph y and Language in 
Educational Settings , edited by J.L. Green and 3. Wallat. Norwood, 
New Jersey: Ablex Publishing Corporation, 19B1, 161-195.. 
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Source = This category identifies the speaker 
involved in' the interaction. Possible individuals 
may include the teacher, the student(s), or some 
other person(s). The inter locuters can be 
identified by anumber code (i.e.; teacher=T. 
student(s)=Sl, S2, etc.). 

Form = Two forms of language used during 
instruction can be identified: The question and 
the response. (It is assumed that most 
interactions in an instructional period are either 
questions or responses to questions.) Three 
subcategories of respohses canbe expected: J 

a. Type A response (+) = This type of response 
is both expected and consistent with the 
sociolinguistic content. This predictable 
response includes those that meet the social, 
cultural, psychological, and semantic aspects 
of the situation. 

i 

b. Type B response (0) = These responses are not- 
predictable given the preceding linguistic, 
topical, or social context. These responses 
may be spontaneous production of language by 
an involved student or they may be responses 
by a student not previously designated in the 
interaction. 

c. Type C response (-) = All nonresponses are 
included in this category. 

Strategies - The purpose of the communication unit 
in its sociolinguistic context is mapped in this 
category. The various types of ■ strategies 
include: 

a. Focusing = This occurs when an attempt is 
made to initiate or change the content of a 
discussion. A shift or focus function 
results. 

b. Ignoring = This is a nonverbal action 
resulting when no response occurs when one is 
required . 

c. Confirming (+) = The acceptance of a 
preceding response is indicated either 
verbally or nonverbally. 

d. Confirming (-) = Here the previous response 
is not accepted. A "no" may be indicated as 
a response to a request for confirmation 
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e. Continuance = A verbal or nonverbal message 
suggesting that the listener is following the 
speaker ! s communication, 

f. Extending = This category includes those 
, messages that are designed to provide 

additional information about a topic, 

g. Raising = This category of communication 
raises the level of a discussion, 

h. Clarifying = Here, messages that explain or 
redefine are included, 

i. Editing * In this strategy, shifts or changes 
in content, iorm, or strategy are signaled. 
Internal mediating of a message occurs, 

j. Controlling = Messages that control an 
interaction or behavior of individuals are 
included in this category. 

k. Refocusing = This type of language strategy 
reestablishes a previous line of thinking. 

1. Restating = In this category are included 
".those messages that repeat or refer to 
previous information ♦ 

Levels = The level of functioning of the 
interaction can bq categorized into three groups. 
They are : 

a. Factual = Literal recall of facts from memory 
are relayed in the message. 

b. Interpretive = Inferential comprehension 
providing information not previously 
discussed is indicated in the interaction. 

c. Applicative = This level of communicative 
interaction requires the information to be 
used in new ways or in novel contexts. 

Ties = The basis of the message is often tied to 
some behavior or message of participants in an 
interaction. This relationship is indicated by 
the "ties" described* The four sources of ties 
include: 

a. Teacher = Here, the message is tied to a 
teacher's goal or may be in response to the 
teacher's message, 
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b. Student * If the message is feedback to the 
student or if it extends to the student s 
purpose or it permits the student to build on 
his/her original message then it has a 
"student tie". 

I 

c. Instructional (media aide) = If the text, 
material or media aide triggers the message 
unit, then it is recorded, as an instructional 
tie. 

d. Context = A context tie occurs when the 
situation serves as the basis of the 
message. 

Many situations may be mapped in obtaining information regarding the 
use of language by students, teachers, and others in bilingual 
programs. Some relevant contexts include: A typical lesson, the use 
of language in informal play situations, the use of language in formal 
non-structured classroom situations or other contexts as may be 
mutually agreed to by both the program managers and the evaluator. 

An ethnography of the use of language in a bilingual classroom or 
program may provide insights regarding the following pedagogical 
questions: 

1. How does the £eacher use language in instructing 
children? 

2. How do children use language with each other? 

3. How do children use language with teachers, aides, 
parents, etc*? 

4 # How does the teacher respond to L2 language 
attempts of the bilingual children? 

5. What is the language development climate in the 
classroom? 

6. Which language (LI or L2) is used with which 
interlocuters? 
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7. Which language (LI or L2) is used with various 
topics? 

8. Which language (LI or L2) is used with which 
situations? 

9. What language use 'opportunities exist in the 
bilingual classroom? 

10. What opportunities exist for the' student to 
experiment using language with different 
participants, on different topics* and for 
different purpose? * 

11. Kow does the teacher deliberately attempt to 
develop the language skills of the the bilingual 
children (either LI or L2)? 

12. How does the teacher respond to "nonadult-like M 
speech (grammar and phonology) from bilingual 
children? 

13. What is the teacher's instructional registry? 

14. How are children of varying linguistic and 
communicative proficiencies accommodated in 
the classroom? 

15. To what extent and under what circumstances 

do student9 with different language dominances 
interact? 



Instruction — The volume of information that ca^be obtained in 
observing instruction that takes place in a bilingual program cap be 
overwhelming. Oftentimes, the focusing of the perceptions of the 
observers may be necessary. Questions such as those posed below 
contain no preconceived hypotheses and can be used as a guide for the 
gathering of data as well as the interpretation of patterns 
perceived. 
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MAJOR CATEGORIES IN HENRY'S 
CROSS-CULTURAL OUTLINE OF EDUCATION* 

\ % 

) 

1. Gn what does the educational process focus? 

2, How is the information communicated (what are the 
teaching methods employed)? . 

3* Who educates? 

4. How does the person being educated participate? 

» 

5. How does the educator participate? What is 
his/her attitude? 

6. Are some things taught. to some and not to others? 

7. Discontinuities in the educational process. 

8. What limits the quantity and quality of 
information a child receives from a teacher? 

9. What forms of conduct control (discipline) are- 
used? 

10. What is the relation between the intent and the 
results of education? 

11. How long does the process of formal education 

l^ast? r 



In addition, information and conclusions relating to the general 
climate of the classroom and/or program can be obtained by the 
evaluator . 



* — J. henry. — " T he Cross Cultural Outline of Education." In: Current 
Anthropology , 4. I960, 269-305. 
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Culture Information regarding culture is probably the most 
difficult to obtain. As used in this context, culture does not mean 
the surface trapping or artifacts associated with a group of people 
like sarapes, clothing, festive or national holidays, and foods. It 
would involve%howeve r » the recurrent behavior patterns, thinking, 
perceptual, and learning styles?. In addition, culture would include 
information regarding the attitudes; values, communication styles as 
well as the expected and the manifest norms found in a classroom or 
found to be true of a program. Essentially, the evaluator attempts to 
find out what is the culture of the bilingual component being 
evaluated. The following is suggested as guides to the types of 
information that may be obtained in conducting an ethnography of the 
culture of a classroom/program. 

"A. Behavior patterns ^ 

1. What are the exp.ected and- manifest norms? 

2. What is* considered by teachers and students , to be 
acceptable .behavior? 4 

3. What is the community's expectation of acceptance 
behavior? 

4. How do students respond to stress? 

5. How do students respond to instruction? 

6. How do students respond to independence as well as 
to structure or to the lack of structure? 

7. What discipline is imposed^ by the school? How is 
it imposed? 

8. What games do students play? (Game theory) 

/ 
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9. How do students interact with elders and act 
toward peers? , 

10. What is the pattern of behavior characteristic of 
cross-ethnic or cross-linguistic groupings? 

Perception and thinking 

l/ What topics are of concern to students? 

2. Do students personalize or depersonalize topics of 
instruction? 

3. Is instruction related to the student's personal ~ 
milieu? 

4. Are the following reasoning styles evidenced: 
Difference, m a g ni-tud e , relationship, or 
appraisal? 

Learning styles and motivation 

1. Do students prefer to work as groups or 
individually? 

2. Do students pre.fer visual or auditory 
presentations? 

?. Is a preference shown for deductive or inductive 
presentations? 

4. What motivates the students? 

5. What reward systems are used in the . classroom? 
How successful are they? 

6. Do students impose structure in learning 
situations or do they need structure to be imposed 
for them? 

7. What is the effect of peer pressure on 
motivation? » 

8. Is there a -preference for personalized or 
depersonalized instructiQn? 

9. On what tasks do students prefer to work? 
Values 

1. *vWhat value statements are heard from teachers or 

students? 

2. Are student's values accepted? 

. • . 41" 
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3. What value prejudices are manifested? (Tastes* 
preferences* and individual choices) 

4. What universal values are* encouraged? (These are 
broad moral values covering such 'general topics as 
fair treatment ♦ individual r rg h t s , equal 
opportunity under the°law, acceptance of diversity 
of sex and race, and the respect for individual* 
expression of diversity.) 

Attitudes , 

1. What are the students ! attitudes toward the 
school? the program? the instruction or 
instructors? 

2. What are the teachers 1 *att it'udes towaTd the 
students? the program? the school? 

3. What are the teachers 1 attitudes toward the 
cultural or linguistic .groups represented in the 
school bil ingual program? 

4. What are the taboo topics not to Be covered or 
• discussed in school? 

5. What status does the Li (native language) hav«e? 
the L2 second language)? 

6. What is the intellectual climate of the school? 

7. Can a temperament, mood, nature, etc., of the 
program be determ ined? 

Communication styles 

1. What language is used in informal situations? by 
whom? 

2. What language is used in formal situations? by 
whom? 

3. Are formal and informal codes of language used by 
students? when? with* whom? 

4., To what extent is the LI (native language) or L2 
(second language) encouraged or allowed throughout 
the day? 

5. With whom can artd do students use LI and L2? 

6. Whfch language is used in discussing which 
topics? 
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G. Accommodation by the schoo.l 

1. To what extent is the classroom and program 
organization flexible in accommodating the 
differences found among the students? 

2. Are norms of behavior, values, attitudes, etc, 
imposed? 

3. What is the school's written or unwritten policy 
regarding the cultural and lingustic differences 
found in the bilingual program? 

4. How is the bilingual education program viewed by 
nonbilingual teachers and administrators? 



Data Collection 

t 

Data collection involves a three step process. First sources of 
information must be' secured. Secondly, the information must be 
organized. Finally, hypotheses must be verified. ( 

SourceS Information can be obtained from many sources in bilingual 
programs. The four main. sources include printed matter, participant 
observation, non-participatiori , and the use of an informant. 

Most programs have developed a brochure or some other printed matter 
which describes the intent, operation', size, etc. of bilingual 
schooling which have been used in d iss emm i na t in g information 
throughout the local community. Often, this printed matter is used 
with parents in recruiting students, with school boards in discussing 
approval of the bilingual program, and in the recruitment of teachers. 
In addition, some local newspapers may have written articles to inform 
the public of the local school's bilingual programs. The project's 
proposals either to thejS^te or federal funding agency can provide 
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other preliminary information tha't may be useful in understanding^ the- 
nature of the bilingual program* With long standing programs, 
previous evaluation reports may provide some data that is still 
current and which may be useful in the initiation of the data 
collection period and the categories of data collection. As with all 
sources of information, the accuracy should be determined as a result 
of direct observation and by confirmation by program managers and 
teachers and as a result of direct observation. 

In the participant observation strategy, the evaluator becomes 
involved in the normal activities of a classroom or program in order 
to gain direct access to information regarding classroom instruction 
and the normal operations of the bilingual programs. By becoming an 
"insider 11 , the evaluator often becomes privy to information that 
otherwise might be kept from him/her. As participant observer, the 
evaluator seeks to blend into the operation of the program. At no 
time should the evaluator attempt to change or judge what is seen to 
occur. 

One caution must be mentioned here. Participant observers need to 
deliberately remain intellectually separate from the program. Too 
close an identification with the participants in the program or its 
philosophy or structure may result in the evaluator sharing the biases 
of the group involved in the program. As a result, objectivity may be 
lost » 
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The nonparfcicipant observation strategy requires that the evaluator 
take note^of occurrences without participating in them. The nature of 
this ntfn involvement, however,- may influence the behaviors of those who 
are observed. Frequently, those who are observed will change 
behaviors to reflect what they value as teachers and managers. As 
altered as the primary information may be, however, it still provides 
the evaluator with useful data regarding the ideals of behavior as 
viewed by the participants in the program. 



In the informant strategy, the evaluator seeks out some knowledgeable 
person(s) from whom to secure information regarding the program-. 
Structured interviews in. which specif ic "questions are asked of the 
informant may be used. With this strategy, specific information 
regarding unclear perceptions may be obtained. Care must be taken, of 
course, to avoid adopting the biases of the informant. Verification 
should always be part of any data gathering period. 

Information Organization -- Information gathered from the various- 
sources may be collected in field notes taken by the evaluator. 
These, of course, may be written either at the time of the occurrence 
and observation or soon after. Little time should lapse between the 
procurement of the information and the recording. Otherwise, memory 
and perceptions may become somewhat hazy. 

In compiling and interpreting notes, every attempt should be made to 
accurately describe the situation. In this depiction, no prejudgment, 
bias, deletion, or predrawnif Occlusions should affect the information 
that is obtained. 
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Information Analysis Synthesis Verification 
and Determination 
Reflection of Patterns 



Conceptualization 
of Model 





With the gathering of data, reflection should • occur. In this 
examination, exisiting information should be analyzed and recurrent 
patterns, concepts, and common themes from natural grouping of 
information are sought. Consistent feedback and verification of the 
perceived themes and patterns should occur along with the compilation 
of the data. 

Upon completion of. the gathering of data and its analysis, a 
conceptual interpretation resulting in a model describing the program 
should result. This model should accurately reflect the actual 
program that was examined in that it. was grounded in the data 
obtained. Finally, hypotheses and policy statements that may be 
requested from the program managers can be developed from the model 
that was constructed. 
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Verification — The final step in the ethnographical process is the 
verification of the model. This can occur either in a written or oral 
form with the managers or other participants of the program taking 
part. The purpose for an ethnographical description of a bilingual 
program is -to accurately depict the character of that program. This 
can be accomplished by the fusion of the perceptions of the 
ethnographers with those intents, understandings, and 
conceptualizations of the participants in the program. 

The evaluator can validate perceptions used in the construction of the 
model by directly asking the participants in the program to review the 
draft of the program description part of the evaluation report. 
Indirectly, the same verification can be obtained by the evaluator by 
soliciting informatics related to the model by asking the following 
types of questions of the participants in the program: 

1. Reportorial = these are literal questions of a 
who, where, what, how, and why nature. These 
questions are used primarily to verify the facts 
included in the description. 

2. Posing = These questions challenge or act as 
devil's advocates by determining the strength of 
the participants' convictions and consistent use 

• of various procedures. The model developed by the 
evaluator/ethnographer is true only as it is 
consistently accurate under various conditions. 

3. Hypothetical = These questions are of a "what 
if..." or "what would happen if..." nature. Use 
of these questions can help the 
evaluator/ethnographer determine the strength of 
the model under unknown or novel circumstances. 

4. Posing the ideal = Here, the program participant 
is asked to describe the ideal situation or the 
evaluator/ethnographer often solicits information 
regarding the a s.fti»r at i ons and goals of the 
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participants as well as perceived faults with the 
existing program. 

5. Offering interpretations or testing propositions 
on respondents = This allows the evaluetor to tell 
the program participants about the propositions or 
patterns that are being used 6 in the construction 
of the model. If the program participants 
disagree with the conclusions drawn* then new 
information can be. secured and/or new patterns 
conceptualized. 



Instrumentation 

The following general ou tl i ne or instrument is suggested for the 
gathering of field notes, their compilation into patterns, and their 
verification. Notes are recorded in four columns or sections. The 
first section, labeled obserVati ons ♦ should contain the most 
information about what occurred in the field. 

This section is followed by two "code" columns. In the first column, 
the evaluator can record whether the information-is related to the use 
of LI (native) or L2 (second) languages, and/or if the information 
provided insights regarding culture (C), or instruction (I). In the 
second column, more specific information regarding these three 
categories can be recorded. For example, information related to use 
of language on topics related to schooling (S), relationships (R), or 
home (H) can be indicated. Additionally., information related to 
values (V), behavior (B), learning styles/motivation (L), or 
perception/thinking (P) can be noted. 



41? 



111-1 2*4 



3» 



Observations 






"ode i 
1 


Zode ; 
2 


Patterns 


Verification 

Implications 
Model 




* 

» 


K 


i 






> 

* 

» 

•> 



XV 



II 1-125 



418 



In the next section, preliminary ideas regarding recurrent patterns or 

* — - — 

perceived relationships can be recorded. This column records 

information regarding the meaning or interpretation of the data. 

Finally* in the last section, verifications that were secured 

regarding, patterns perceived can be recorded. Implications for policy 

statements can be listed. Modification warranted from new information 

can also be indicated. This section can contain the preliminaries of 

the model describing the bilingual program. 



410 

111-126 



In mapping classroom language use, the chart presented below can be 
-used. Language from the classroom should be recorded (taped or 
written) and each 14<ne of interaction can then_ be numbered to 
correspond to the analysis represented on the chart. 



TAOUC 1 

OMCflptlv* An«iyii* ol Instruction*! Conv«rs«tion* 




(Green and Wallat. Ethnography and Language in Ed ucational Settings,. 
1981, 169.) 

Final analysis of this language-use information can result from a 
seeking of recurrent patterns and the drawing of implications that 
help characterize a realistic model of instruction occurring in the 
bilingual classroom(s) . 
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Information Usage and Reporting 

Two types of reports can result from the evaluation suggested in this 
section. First, case studies depicting individual classrooms, and 
second, an ethnography of an entire program. ( Both types of reports 
are similar in content but differ in scope. Obviously, the 
ethnography involves the synthesis of information from several 
classrooms and, as a result, may be the more difficult to develop. 

Similar to other types of programmatic descriptions, case studies and 
ethnographies include a discussion of the history of th^ program, a 
discussion of the student population in ternps of language and 
ethnicity, a description of the program's facilities, number of 
students in the program, teachers involved, time allocations for 
various instructional components, and enumeration of the goals of the 
bilingual program. In addition, case studies and ethnographies 

include the following types of information: 

% 

1. Discussion of entry procedures and site 
selection. 

2. Characterization of the procedures and site 
selection . 

3. Description of the encounters (contacts) with the 
students, teachers, and managers in the program. 

4. Discussion of the classroom(s) in terms of its 
culture, the use of language, and the organization 
of instruction. 

5. Perceived patterns and the model resulting from 
the synthesis of these patterns. 

6. Implications, conclusions , v and policy statements. 
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SECTION III 
WORKSHEETS AND FORMS USED WITH THE 
DESIGNER'S MANUAL 



This section provides program directors and evaluators with a complete 
set of worksheets which are recommended in the Designer's Manual. 
These worksheets are included in this volume in order to facilitate 
their reproduction, dissemination and use. An index, by title and 
worksheet number (when appropriate), follows this brief introduction. 
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Comprehensive Test of Basic Skills, English 

Version 1973, Spanish Version 1978, Fojyri^S_ 
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IOWA Test of Basic Skills, 197B, Forms 7 and 8 
Sequential Tests of Education Progress (STEP) 

III, 1979, Forms X and Y I 11-56 

Metropolitan Achievement Tests (MAp, 

1978, Forms 01 and Kl " ttt 59 

SRA Achievement Series, 197B, Forms 1 and 2 UW7 
Stanford Achievement Test, 1973, Forms A, 

B, and C . ttT 

Test of Basic Experience (TOBE) II, 1978 UU61 



111-63 



List of Test Publishers 

Annotated List of Language Prof iciency Tests 



CD?/-- 111-131 



111-73 
111-75 



Basic Inventory of Natural Language (BIND }n~77 
Bilingual Syntax Measure (BSA) . 

Comprehensive English Language Test (CELT) for 

Speakers of English as a Second Language til 79 

Ilyin Oral Interview Test ttt Rn 

Language Assessment Battery (LAB) III Bl 

Language Assessment Scales (LAS) tlI~B2 

Primary Acquisition of Language (PAL), . JJJ" ° 

Shurtt Primary Language Indicator Test ( SPLIT) m-o-> 

MEASURING SELF-CONCEPT ( III-B5 

P ublished Self-Concept Scales !!}"?L 

List of test Publishers and Developers in-xw 

ETHNOGRAPHIC METHODS OF PROGRAM DESCRIPTIONS HI-105 

f 



Worksheet ~f1 



DETERMINE AUDIENCE AND INFORMATION REQUIREMENTS FOR THE EVALUATION 



Audience 



Type of Information Needed 



Reason Information Is Needed 



Date 
Information 
Is Needed 



Type of Report and 
Section, to Emphasize 
In Cover Letter 



Page 1 of 1 



Worksheet *2 

SETTING PRIORITIES 

Put a "1" by components which will receive a maximum emphasis, a "2" 

by components receiving moderate emphasis, a "3" by components receiving 

mi nimum emphasis, and an «V by components which will not be evaluated. 

Done This Next Following 

last year Year Year Year 

(19 ) (19 ) C9 ) (19—) 

Evaluation Components — 

A. Program Description Information 

1. Project Overview " 

2. Instructional Approach ~ — 

3. Project Management — " 

B. Program Operations 

1. Instructional Program Implementation - 

2. Staff Development ~ ~ ~ 

3. Parent Involvement — — ~~ 

C. Student Effects 

1. English Language Component 

2. NonEnglish Language Component _ - 

3. NonEnglish Academic Component ; 

k. Nonacademic Student Effects . % 



426 

1 n-135 
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TIMETABLE FOR THE EVALUATION 



Year 



MONTHS 



Tasks 

A." Plan Evaluation Design* 

1. Oetermine which goals 
and objectives in each 
component to focus on 

2. Cost out evaluation 
3 # Summarize design for 

administrator 

B. Project Description 

1 . Collect data - divide 
up 

2. Summarize data 
3« Review S analyze data 

for purposes of plan- 
ning its use in 
analyzing evaluation 

data 



C* Monitori-ng of Program 
Operations 

] Instructional Program 
Implementation 
a* Develop/select 
instruments 

b. Administer 
instruments 

c. Analyze data 

d. Interpret data 

e. Oraft Report 
Section , * 

2. Staff Training 
< a. Develop/select 
instruments 

b. Administer 
instruments 

c. Analyze data 

d. Interpret data 

e. Prepare report 
sect i on 



k'uql §eo| Oct! Novl Bed Janl Feb 



Mar 



Apr| May! Jun| Jul* 



III 
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Year 



Page 2 of k 



Tasks 


MONTHS 


Aug 


Sep 


Oct 


Nov 


Dec 


Jan 


Feb 


Mar 


Apr 


May 


Jun 


July 


3. Parent Involvement 

a. Develop instruments 

b. Admin ister instru- 
ments 

c. Analyze data 

d. Interpret data 

e. Draft. report 
section 

0, Evaluation of Language 

Components 

1 . Develop/select 
instruments 

2: Administer instru- 
ments 

3. Analyze data 

k. Interpret data 

f. Draft report 
section 

E. Evaluation of Non- 
language Academic 
Components 

1. Select instruments 

2. Administer instru- 
ments 

3. Analyze data 

4. Interpret data 

5. Draft report section 

F. Evaluation of Non- 
academic Components 

1. Develop/select in- 
struments 

2. Administer instruments 

3. Analyze data 
k. Interpret data 

5. Draft report section 

G. Report 

a. Compi 1$ report 
sect ions 

b. Review report 

c. Prepare final report 




















» 


i 





428 

,? r - »f f . 11 1-138 
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TIMETABLE FOR THE EVALUATION 
(Completed Sample) 



Year 




.Last possible time to do this. Ideally ,th ; i s would also^be done the 



previous spring. 



9 

:RLC 



1 11-139 
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Page k of k 
Year 



Tasks 



0, 



3. Parent Involvement 

a. Develop instruments 

b. Administer instru- 
ments 

Analyze data 

d. Interpret data 

e. Draft report 
sect ion 

Evaluation of Language 
Components 

1 . Oeve lop/select 
instruments 

2. Administer instru- 
ments 

3. Analyze data 
k. Interpret data 
5. Draft report 

sect ion 

E. Evaluation of Non- 
language Academic 
Components 

1. Select instruments 

2. Administer instru- 
ments 

3. Analyze data 
k 9 Interpret data 

5. Draft report section 

F. Evaluation of "Non- 
academic Components 

1. Develop/select in- 
struments 

2. Administer instruments 

3. Analyze data 
k. Interpret data 

5. Draft report section 

G. Report 

a. Compile report 
sections 

b. Review report 

c. Prepare final report 



Aug| Sepl Oct 



MONTHS 



Nov) Qecl Jan 



— 



-ft * 



Feb Marj Apr j May) Jun| July 



**Part?al analysis interpretation and reporting is done at this point. 

. I I ll-UO 
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OPERATING CHECKLIST FOR BILINGUAL EDUCATION 
PROGRAM EVALUATION 



EVALUATION STEPS 

1. Planning, Managing, and Staffing the Evaluation 

1.1 Determination of audience for the evaluation 

1.2 Determine the focus of the evaluation 

1.3 Allocation of resources for evaluation 
activities 

1.i» Setting timelines for evaluation activities 
• 1.5 Develop overall management plan of evaluatio 

1.6 Hire outside evaluator 

1.7 Assigning evaluation responsibilities to 
staff 

2. Planning Data Collection for the Evaluation 

2.1 Description of , program 

2.2 Description of students 

2.3 Description of program's goals 

3. Planning Monitoring of Program Operations 

3.1 Description of program in operation 

3.2 Description of staff development activities 

3.3 Description of parent involvement 

it. Planning Evaluation of Student Outcomes 

l».1 Selection of evaluation questions 

V. 2 Selection of evaluation design for English, 
non-English, and other areas 

i».3 Selection of assessment instruments 



Initiated 


Completed 
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Initiated 



Completed 



h.k Scheduling the testing for the evaluation 

k.5 Designing procedures and scheduling data 
collection 

*k6 Planning the analysis of the data* 



k>7 Reporting the results 
Reporting the Results and Writing the Evaluation 



5.1 Identification of audiences and reporting 
requi rements 

5.2 Establishing timelines 

5.3 Outl ine for report 

5.4 Analysis of the data 

5.5 Selection of convening the interpretative 
panel for analyzing the data 

5.6 Writing the evaluation report and planning 
presentations of results 




Report 



r 



432 
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Worksheet #5 

EVALUATION SUMMARY GUIDE 



Evaluation 


Evaluation 
Instruments 


Source of 
Information 


Data Colle 

Who does 
it 


ct ion " 
When 


Data A 
Who 


nalysis 
When 


Data Inter 
Who 


pretation 
When 


Repor 
Who 


ting 
When 


Questions 

Program 
Oescript ion 

Monitoring 

Program 

Operations 

Student 
Outcomes 


• 



* 


i 








• 











433 43 < 

er|c « 
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Worksheet *6 
Part A 



ESTIMATING LEVEL OF EFFORT REQUIREMENTS 
F0R * 

DESCRIBING THE PROGRAM AND THE STUDENTS 



. j j fnr rhree leV els of evaluation activity for a given 
Estimates are provided for three leveia 



year: 



(Different activity levels may occur each year) 



a) Minimum - collect information from project proposal, school records, 
anci project director. 

„) Moderate - collect Information ^^^'^X^^^ 
project director, and a sample (one to three p. ople " * 

purpo5« h«lo», assume total number of people ,n..rv.«wed or 
receiving a questionnaire is eight). 

c) „ aj or - sa M as that described for *^ t £^,Z?J3££ 
each category are I*™"!*?" .< J 0 p Ur pote S P b«low, assume 
%X£XZ£ o^e' interred r rec W?no qU estionnaires is 
fifteen and that three classrooms are observed) 



Task , - 

Prepare, di.scuss with and 
obtain support of project 
director for proposed plan 

2, Prepare data collection 
instruments (using samples 
provided in Designer's 
Manua 1 ) 

3. Identify specific people 
or records from whom to 
col lect data and make 
arrangements 

k. Collect data 



Level of Effort for a Given Year 
(in Days) 

Minimum Moderate Major Your Est. mate; 



4 



14 



3 5 
5 12 



,-bhe estimate for an, ^^^X^ 
the total for the evaluator. 



Mr 



435 
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Worksheet Page 2 0 f n 

Part A j 

Level of Effort for a Given Year 
(in Days) 

Task Minimum Moderate Major Your Estimate 

5. Analyze and organize data 

for use in report and analysis 
of evaluation data collected 

for later components 2" k "* 6 



Total Days % (Si) (134) (25i) ( J 

Eva 1 uator 



( ) 

Project Staff 



m 



436 
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Worksheet #6 
Part B 

ESTIMATING LEVEL OF EFFORT REQUIREMENTS 
FOR 

EVALUATING PROGRAM OPERATIONS , 
Estimates are provided for two levels of activity to be conducted during 
a given year for each of three components - instructional methods, staff 
development, parent involvement (Different levels of activity may occur 
each year): 
Instructional Methods 

a) Minimum - Conduct observations and interviews twice/year in i only 

two classrooms and have evaluator do interpretation.- 

b) Major - Conduct observations and interviews three" times/year in 

all classrooms (for estimation purposes below, assume 
total number of classrooms equal five) and have inter- 
pretative panel .* 

Staff Training 

a) Minimum - Same questionnaire given to trainees following each train- 

ing session. Knowledge test not used and evaluator does 
interpretation. (For estimation purposes below, assume, 
fifteen trainees and three training sessions). 

b) Major - Same as~for minimum," plus a knowledge test given pre and 

post training, an end of project summary questionnaire 
given and an interpretative panel is used. (For estima- 
tion purposes below, assume fifteen trainees and three 
training sessions) . 

fc ■. 

Parent Involvement > 

a) Minimum - Add,ress only the issue of the extent to which the level of 
parent involvement matched the planned level); evaluator 
interprets data. 

i * b) Major * Address" all four proposed evaluation questions given on 

page 81. (For estimation purposes below, assume ten 
parents and eight staff .members interviewed); have inter- 
pretative panel . 



The alternative methods of interpreting the data are discussed in the 
staff ing chapter -which follows. 



437 

1 1 1 - 1^7 
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Part B 
Task 
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, 'level c^f Effort (i/? Days) 
Mini mum Major Your Estimate* 



VJ 1 ^ 



Instructional Method 

I. Prepare, discuss with and 
obtain" support of projecSt 
director for proposed plan 

2*. Prepare data collection." 
instruments (using samples, 
provided in Designer's 
Manual) 

3. Identify^who to observe 
and i nterview.and make 
arrangements to do so 

k. Collect tiata v 

5. Analyze data 

6. Interpret data 

7. - Write report section 



I 

i 
1 

1 

r 



5 

- 

■ 3 
v 2 



Total days 



(64) ( 



Staff Tra ining 

1. P^pare, discuss with and " 
obtain support of project 
director for propose^ plan 

2. Prepare data collection 
instruments (using samples 
prov i ded in Des i gner 1 s 
Manua 1 ) 

3. Make arrangements for data 
col lection 

Collect data - minimum (have 
trainer collect all data); 
major (have trainer collect 
all data except end of year 
questionnaire) 




Evaluator 



( 



) 



Project Staff 



, 4- 



t 9 



Circle the estimate for any tasks which wTfT fcys done\by project sta 
instead of the external evaluator;* Do not include the}s6 amounts in 
total for the evaluate*?. ■ 4'38 

Ill-l4ff I 



ff 
the 
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Part B 

Task 
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Level of Effort (in Oays) 
Minimum Major Your €stimate 



5. Analyze data 

* * 

6. Interpret data and 
develop recommendations 

7. Write report section 



4 



7 



1* 



Total days 



(5) 



(19) 



( 



) 



Evaluator 



) 



Project^Stf/f 



Parent Involvement 

1. Prepare, discuss with and 
obtain support of prqject 
director for proposed plan 

2. Prepare data collection 
instruments (using samples 
provided in Designer's 
Manua 1 ) 

* 

3. • Make arrangements for data 
r , collection 

h. Collect datd 

5. Analyze data 

6. Interpret data and 
develop recommendations 

7. Write report section 



Total days 



1 

1 



1 

6 
3 

2 
2 



(15*) 



( 



) 



Eval uator 



( 



) 



Project Staff 
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Worksheet #6 
Part C 



ESTIMATING LEVEL OF EFFORT REQUIREMENTS 
' FOR 

EVALUATING STUDENT OUTCOMES 



Estimates are provided for two levels of activity to be conducted during a 
given year for each of four components-7Engl i sh language component , VionEng 1 ish 
language component, not? language academic component, and nonacademic student 
effects. 



English Language Component 

a) Minimum -Use norm-referenced evaluation design only ^analyze by* 

grade, subject, language used in instruction, and student 
proficiency; evaluator does interpretation. 

•* 

b) Major - Use time series, norm-referenced and comparison group 

evaluation designs; analyze by grade, subject,, language 
used in instruction, student proficiency factors; use 
interpretative panel. 

NonEnglish Language Component 

a) Minimum - Use existing test and do norm-referenced evaluation design 

only; analyze by grade, subjedt, language, used and student 
proficiency,; evaluator does interpretation. 

b) Major - Develop own test; use time> series, norm-referenced and 

comparison designs; analyze by grade, subject, language 
used in, instruction' and student proficiency; use inter- 
pretative panel . 



Non language Academic Component 

a) Minimum - Use existing test, compare to national norms; analyze only 

by grade; evaluator does interpretation. 

b) Major - Use existing test, compare to national norms; analyze by 

grade and two other key factors; use interpretative panel. 

Nonacademic Student Effects' * 

a) Minimum - Use only a publ ished ^self concept measure; analyze only by 

grade and student proficiency; evaluator does interpretation 

b) Major - Use alf proposed evaluation questions and data collection 

instruments; analyze by grade and student proficiency; use" 
interpretative panel. M 

440 
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Part C 



Task 



Level, of Effort (in Days) 
Minimum Major Your Estimate* 



English Language Component . 

1. Prepare, discuss with and 
obtain support of project 
director for proposed plan 

2. Select appropriate tests 

3. Train test administrators and 
. make arrangements for testing 

A. Supervise testing - minimum 
(one day each, pre- and post- 
testing); major (monitor all 
testing) 

5. Analyze data - minimum ('pre- 
pare achievement data for 
standard computer analysis); 
major (prepare data for 
standard computer analysis, 
for several analyses) 

6. Interpret results 

7. Write report section 



3 
2 



4 
5 



14+ 



8+ 
10 
10+ 



Evaluator 



NonEnglish Language Component 

1. Prepare, discuss with and 

. obtain support of project ( 
director for proposed plan r 

2. Select appropr iate, tests • 1 

3. Train test admiai strators and , . 
make a.r rangements for testing 1 



Circle estimated any tasks which will be done by project staff instead 
of the external evaluator. Do not include these amounts ,n the total for 
the evaluator. 

" M ' 5 ' • 441 
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Part C - * 

Level of Effort (In Days) 
Task Minimum Major Your Estimate* 

k. Supervise testing - minimum 
(one day each, pre- and post- 
testjng) ; major (monitor a 1,1 

testing) s 2 10+ 

5. Analyze data - minimum (pre- 
pare achievement data for 
standard computer analysis); 
major - (prepare data for 
standard computer analysis 

for several analyses) 2 8 

6. Interpret results ? ^ 

7. Write report section 2 10 ____ 



Total days 



(104) 



(454) 



( 



) 



Eva I ua tor 



( 



) 



Project Staff 



Non language Academic Component 

1. Prepare, discuss with and 
obtain support from project 
director for proposed plan 

2. Select appropriate tests - 
minimum (become familiar with . 
district tests); major (review 
commercial achievement tests 
and match to curriculum) 

3. Train test administrators and 
make arrangements for testing 

Supervise testing - minumum 0 
(one day each, pre- and post- 
testing) 

5. Analyze data - minimum (pre- 
pare achievement data for 
standard computer analysis); 
major (prepare data for stand- 
ardNiomputer analysis for 
several analyses*) 



5 
2 

10+ 



ERLC 
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Task . 
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Level of Effort (In Days) 
Minimum Major Your Estimate* 



6. Interpret results 

7. Write report section 



2 
2 



10 
8 



Total days 



(104) 



(45*) 



( ) 

Evaluator 



) 



Project Staff 



Nonacademic Component 

1. Prepare, discuss with and - 
obtain support from project 
director for proposed plan' 

2. Select or develop appropriate 
instruments 

3. Train test administrators and 
make arrangements for testing 
and other data collection 

k. Analyze data - minimum (pre- 
pare for standard computer 
analysis) 

5. Interpret results 

6. Write report section 



2 
2 
2 



i 



8 
10 
8 



Total days 



(84) 



(3H) 



( ) 

Evaluator 



( 



) 



Project Staff 



I ERIC 



111-153 



443 



\ 



Worksheet #6 
Part D 



\ 



SUMMARY OF ESTIMATED LEVEL OF EFFORT 
REQUIREMENTS AND 
ASSOCIATED COSTS 
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Summary of Days 

Program Description 

Monitoring Program 
Operations 

Instructional 
MetHods 

Staff Training 

Parent Involvement 

Evaluating Student 
Effects 

Engl ish Language 
jmponent 

NonEngl ish Language 
Component 

Non language 
Academic Component 

Non language Student 
Effects 



Eval uator 



Project Staff 



( 



) 



( 



Total days X eval uator cost per day =» Total eval uator cost per year 
X 3 
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Part D 
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Addi tional Cost Items 
1 . Secretary time 
l u Printing 
3 # Mailing 
k. Other 

a. 

b. 

c. 

d. 



Program 
Description 



Costs (in Dollars) 
Monitoring Evaluating 
Program Operations ' Student Effects 



Totals 

Total Evaluator Costs 
Total Additional Costs 
Total Costs for Evaluation 



111-155 
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Worksheet 41 

DATA COLLECTION FORM FOR INFORMATION FROM THE 
PROJECT PROPOSAL AND OTHER RECORDS 
The project proposal and various project or school records should be reviewed 
to obtain the indicated information, 
(C) 1. What are the major project goals? 

Li ngu i st i ca 1 1y __ ■ — — 



Cultural ly 



Academical ly 



(S) 2. What is the pattern of predominant languages among the student 
population?. 



(S) 3. What is the approximate achievement level (in la nguages , other 
academic and nonacademic areas) of students within the various 
language categories? Report separately for each language group. 



Language achievement 



O'.her academic achievement_ 



ERIC 



C - refers to program context G - refers to program goals 

L = reTers u<-< w y ^ . k p - refers to nstruct onal programs 

S » refers to program, students r reiers to 



111-157 
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Worksheet #7' Pa 9 e 2 of 5 
Nonacademic achievement 

J 



(?) k. What grade levels and how many classrooms are served by the project? 



(P) 5. What portion of the school day is covered?^ , 

•(C) 6. Describe the following community characteristics 

a. Languages spoken (approximate percentage speaking each language) 



b- Ethnicity (approximate percentage of each) 



c- Socioeconomic status (general description based on type of 
employment) m 



d- Size of community 



417 
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(C) 7. Describe the local education agency as follows: 

a. Size . — ■ 

b. Financial status of district_ _^ 

c. Facilities available for project^ 



(C) 8. Describe the following school characteristics 

a. Number of bi lingua Is in school by language group 



b. Number of students in bilingual program 



c. Bilingual mix in the classrooms 



(r) 9. Oescribe the project staff and its organiza 
the staff, the percentage of time committed 




qual i f icat ions 



Title 



Name 



Percentage 
t ime 



Qual if i cat ions 




4,1 £ 
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b. Describe the organizational structure of the project 



# 



c. What selection procedures are used in selecting staff members? 



(P) 10. Describe the project director's role with respect to the following items: 
a. Funds and budgets . , ^ , 



b. Public relations 



c. Administration 



d. Overseeing instruction 
\ 



I ! 1-160 



Worksheet #7 

» t 

e. Staff training 



Page 5 of 



f. Developing and ordering materials and equipment 



g. Staff recruiting and hiring 



45u 
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Worksheet i 8 



PROGRAM DIRECTOR INTERVIEW SCHEDULE 



(G) l.The goals of the program as stated in the proposal are "follows: 
(Present, the goals orally or in writing as obta.ned from the pro- 
posal.) * 



What evidence will show that these gpals have been met?_ 



Which goals have the highest priori ty?_ , _ 



(G* 2. How would you define theprogram as to the extent which it is a 
intenance, transitional or'partial bilingual program? 



ma I I 



(C) * 3. Oescribe the mobility of the community including any specific data 
ava i lab 1 e 



9 

ERIC 




C * Irff er s to program content 
G prefers to program goals 
S * refers to students 
P * refers to instructional program 



451 
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it 
\ 



(P) k. How are students assigned to classrooms?^ 



±_ 



(S) 5. Describe the student ehtry and^exH--'cr i ter i a and procedures. Do the 
actual procedures conform t9 the -(panned prpcedures?, . '„ 

' A- f . — : 

• - ; ; ~ r " ~ 

(P) 6. Describe the scheduling of • i nst ruct i&n>fic) udi n'g da ijy schedules and 
grouping and regroup'i ng 'across and within classes 



T 



(P) 7. Describe the staff and its organization, i.n terms of the following 
di mens ions 

a. Staff members! time commi tmentS 



b. Staff organizational structure^ 



c. Staff qualifications 



d. Staff selection procedures 



452 
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(p) 8. What is your general leadership style as program director? 



(P) 9. What is your'Vole as program director with respect to 
following areas? 

a. Funds and budgets , 



/ 






PublJc relations 






D. 








' ym . 




Admin? stration 






c. 










*d. 


A V arcPAinq instruction 














e. 


Vraff traininq 














e. 


Developing and ordering materials 


and eauioment 
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f. Staff recruiting and hiring 



(?) 10. What is the teacher's role in the following areas? 

a. Planning instruction 



b. Implementing instruction 



c, Non instructional responsibilities 



(P) 11. What is the role of the aides in the program? 



... ^ ^ . 

~ 

(P) 12. What is the ro-le of other staff members such as the following? 

a. Instructional coordinator . 

^ 1 

b. Community coordinator 



c. Eval uator 
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d. Other (please specify) 



e. Other (please specify )_ 

/ 



(P) 13. Describe the program's staff development activl ties related to the 
following aspects. 



a. Needs assessment 



b. Structure of training (pre-service and in-service^ 



c. Characteristics of training 

■ (1) Appropriateness for staf f of d i ffering levels of knowledge 
and experience 



(2) Practical i ty_ 



(3) Coordination with degree programs^ 



455 
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{k) Integration with other training 



d. Audiences trained ( progFam and/or nonprogram staff) 



(P) U. Describe the involvement of the community and parents with respect 
to the following items, 

a.. Parent involvement in school affairs m 





b. 


Community inDUt in proqram planning 

<? 






c . 


Evidences of commun i tv support for the program 






d. 


Parent education 






e. 


Parent conferences/counseling 







9 
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(P) 15. Describe the means of conmunication of the following groups, 
a. Among program staff 



b. Program staff with the following nonproject staff: 
(1) Principals ^ 



(2) Other district adn;i ni strators_ 



(3) Nonprogram teachers^ 



(k) School board_ 



(P) 16. What means are used to disseminate project information to school 
personnel, parents and community? 



457 
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Worksheet ? 9 

PROGRAM STAFF INTERVIEW SCHEDULE 



. (Check one) Program staff Bl 1 ingual teacher 

(G) 1. What is the intended content of instruction (i.e: the theoretical 
' curriculum) with respect to the following matters? 

a. Content areas covered — — 



b. Relationship of content to program goals 



c. Who determines the content? 
. 1 — 



d. What articulation is there between program content and 
extant district curriculum? 



2. Oescribe the presentation of content with respect to the following 



(P) 

i terns . 



Type of instruction model or theory (e.g. concurrent, alternate 
week/day, preview-review, half day, resource room, and/or 
bi 1 ingual aide) 



C » refers to program content _ 

S « refers to students 45S 

G * refers to program goals iii-i/i 

q P • refers to instructional program 

ERIC ^ * 
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9 

ERIC 



b. Organizational practices (e.g. individualized, large group, 

learning centers, peer tutoring, small group instruction, and/or 
team teaching) 



(?) 3. Describe the methodologies employed for bilingual education with 
respect to the following items. 

a. Language of instruction 

(1) General language use plan of teacher and student over length 
of program 



(2) Daily instructional time in each langu^ge_ 



(3) Variations for different student groups^ 



(*0 Criteria for establishing language of instruction 



45<j 

, 1 11-172 



\ 



Worksheet #9 



Pagfe 3 of 7 



b. Approach to nonstandard forms 
(I) Acceptance 



(2) Form of corrections 



c. Approach to second language instruction 
(I ) Formal i nstruction 



(2) Functional use of second language for content instructi 
•and other activities 



d. Approach to reading instruction 

(I) Language in which students learn to read 



(2) Criteria for beginning reading in second language 



(P) h.. Describe the sgjcific instructional methodologies used in each 
subject area 



4SH 
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(P) 5, Describe those aspects of the program that are intended to motivate 
students and improve their self-concept with respect to the follow- 
ing matters: 

a. Appropriate content and language of instruction 

(1) Using I, for instruction m • 



\ 

(2) Accepting language of- the student 



(3) Content that relates to experiences of students 



(4) Culturally relevant material 



b. Improved affective climate 

(1) Placing equal value on both languages and cultures 



(2) Insuring student success 



4 b' J 
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(3) Involving pa rent s_ 



(}\) Teacher as a role model_ 



c. Discipline approach 
(1) Philosophy^ 




(2) Guidelines/approach to control 




(3) 



Special reward systems (e.g. prizes andpriv/i leges) 




(P) 6. What materials are 

a. Core materials in use 
(1) Commercial 



used within each of the following categories? 




(2) Locally developed 



:R1C 
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b. Appropriateness 
(1) Linguistic 



(2) Cultural 



(P) 7. Describe the role of each of the following personnel in the classroom 
a. Teachers 



b. Aides 



c. Parents 



d. Peers 



e. Resource staff 

r 



ERIC 



4 €3 

1 11-176 



Page 7 of 7 

Worksheet 43 



.(P) 8. Oescribe the program director's work with respect to the following: 
a. Leadership style_ 



b. Role or responsibilities in connection with each of the following 



(1) Funds and budgets_ 



(2) Publ ic relations^ 



(3) Administration^ 



(k) Overseeing instruction^ 



(5) Staff training^ 



(6) Developing and ordering materials and equipment^ 
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Worksheet #10 

LOCAL AND 01 STRICT ADMINISTRATOR'S INTERVIEW SCHEDULE 



(G) 1. Describe the school district's general goals_ 



(C) 2. What is .the school district's philosophy toward language and cul- 
tural diversity? 



(P) 3. To what extent is there articulation of program content with the 
existing district curriculum? 



fp) k What is the relationship between the program staff and each of the 
(P) ' n catecories of district personnel?" Comment specially 



following categories o 
on program acceptance 



a. Principals 



b. Central office administrators^ 



c. Nonproject teachers^ 



d. The school board_ 



* C = refers to program context *4^ >r T 
S - refers to students 111-179*°° 
G » refers to program goals 
P » refers to instructional program 
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(P) 5. Describe the dissemination of program information to the following 
two groups . 



a. School personnel 



b* Parents and the community 



4Su 
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CLASSROOM OBSERVATION SCHEDULE 



List the ^content areas 
covered during the class 
period as they occur. 



JQate : 

Class Hour: 
II 



Instructor: 



Observer: 



1 . 


*t ime 


started: 




t ime 


ended : 


2.- 


t t ime 


started: 




t ime 


ended : 


3. 


t ime 


started: 


trime 


ended: 


k. * 


t ime 


started: 




t ime 


ended': 


J. 


t ime 


started: 




• t ime 


ended : 


6. 


t^Jne 


started: 




time 


ended : 


7. 


t ime 


started: 


t ime 


fended : 


. 8. 


time 


started: 




time 


endted : > 


8. 


•t ime 


started: 




time 


ended : 



List the instructional 
methodologies employed as 
they occur during the 
period: 



Summary statement 
of per iod) : 



(enter at end 



III 



Three 

Minute 

Period 



The beginning and ending time for each of the instructional components 
of the close period can be indicated in^item I above. In addition the 
observer can indicate here estimates of how much time fell within each 
of three categories during each three mi nute" segment of the class 
period. j 

Three Ori-task On-task 

Minute Off-task Students Students 
Period Time Active* Passive 



Off-task 
Ti me 



On-task 
Students 
Active* 



On-task 
Students 
Pass i ve 



2 
3 
k 
5 
6 

7 
8 

9 

10 



One or more students engaged 
the teacher. 



11 
12 
13 
\k 
15 
16 
17 
18 

19 
20 



i n 



behavior for which they get feedback from 
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IV. Describe any variations *in 
teaching approach used fo«- 
different student groups 



(include any variations in 
pace of instruction for in* 
dividuals or groups) 



V. Describe any evidence of self 
concept development and mo- 
tivation including indicators 
of (a) accepting the language 
of the student and (b) con- 
tent that relates to the 
experience of the students 



Summary statement (enter at end of 
per fod) 



Summary statement (enter at end of 
period) 



VI. Oescribe the role of all of the following personnel who were present 
in the classroom. 

(1) Teachers: 

(2) Aides: 

(3) Parents: 

(k) Parents: / 
(5) Resource staff: 
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Worksheet # '2 

PROGRAM OPERATIONS INTERVIEW SCHEDULE FOR TEACHERS 
I. What ( e the major instructional methods that you employ? 



2. Why do you use these particular methods, i.e. are these particular 
methods directed to particular instructional objectives? 



3. Are there other instructional methods that you would prefer to employ if 
it were not for various circumstantial constraints that you face? 



k. If so, what are these constraints?^ 



5. What program changes would you recommend that would facilitate your 
efforts to provide the best instruction possible? 



. 4S0 
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6, How typical would you say the class period that we observed was in terms of 
the instructional approach used and the nature and amount of interaction 
with students? How was it atypical? 



7. How do the entry and exit criteria and procedures actually used differ from 
those planned for the project? (interviewer: Be prepared to describe the 
planned procedure. This information can be obtained through W #13 «) 
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Worksheet sM 3 



STAFF DEVELOPMENT QUESTIONNAIRE 



Name of training activity 
Date of training 



Name of person completing questionnaire (optional)^ 



I, In general, what expectations did you have for the staff training pro- 
• vided as part of .this project? 



2. To what extent were these expectation's met?_ 



3. Based on your knowledge of the objectives for this staff training, which 
objectives do you think have been met? 




i». Which objectives do you think have not been met?_ 




/ 



( 
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Worksheet 4 IV <* 
* 

INTERVIEW SCHEDULE FOR LEADER OF PARENT ACTIVITIES 

I, What is the general scope of parent involvement which was planned for the 
" . project thi s year? 

'h ' 



2. To what extent have these" goals changed since the beginning of the project 



year? 



3. To what extent have these goals been met?_ 



k. Are you satisfied with the level of parent involvement? Is the staff as a 



yo 

whole satisfied? 



5. To what extent and in what ways has parent involvement changed over the 
life of the project? 



6. Whafare'the most pos i tive 'aspects' of parent activit 



i es i 



:RLC 
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J- 



7. What aspects of the parent involvement have thre^rrtpst potential, for 
improvement? , . * . 



8* What' changes are you recommending be made in parent activities in the 
future? 
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Worksheet * 15 . 

PARENT, INTERVIEW SCHEDULE 
(C) 1. To what extent have you been Involved in school affairs? 



(P) 



2. To what extent are -you aware that the school has gotten 
"suggestions and reactions from the community in planning 
its bilingual education program?_ . — . 



.(C) 3. How much community support do you believe there is for the 
bilingual education project? 



(P) 



How much education has the school district provided for you 
as a parent as part of the bilingual education project? 



(P) 



(?) 5. To what extent are you aware that the school has provided 
parent counseling or conferences? . 



6. What information have you received about the bilingual 
education- project from the school .district? 



9 



P. - refers to instructional programs 
C ■ refers to program context 
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© 



(p) 7. The bilingual program has as one of its goals (fill in the 

goals related to parent involvement). To what extent do you 
think this goal has been met?" What evidence do you know of 
that Indicates this goal has been met? 



P » refers to instructional programs 
C =• refers to program context 
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Worksheet -16 

EVALUATION DESIGN WORKSHEET 
I . Subject Area and Language : 



Oate: 



Tests: HR1: • r ~ 


/ 

11. Proaram Student Description: . 




> 



Language Skills: English: 



Other': 



Other Descriptors: 



T 



III. Comparison Lata (Groups and Years), 



Student Groups 



Test Code 



Current Year 



Earlier Years 



IV. Evaluation Questions 

Student Performance 

1 . Relat i ve. Standards 
of Performance: 



2. Absolute Standards 
of Performance: 



c- 

O 
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*or<sneet * 17 

BILINGUAL PROGRAM EVALUATION REPORT OUTLINE 

Check 

Section When Done 

I. Executive Summary (3*5 pages) - 

A. Overview of project goals, numbers and types 
of students served, instructional approach 

and evaluation design — 

9. Summary of findings 

1. Instructional methods 4 % 

2. Parent involvement component „ — 

3. Staff development — 

k. Student outcomes — 

a. English language 

b. NonEnglish language _ ■ . 

c. Non language academic — 

d. Nonac^demic student effects 

C. Recommendations — 

II. Program Overview and Background (2 pages) 

A. Context of program including community 
characteristics, LEA, and school 

description " ■ 

3. Student description and needs 

C. P; oaram* s major goals 

D. Program methods — ■ — — — 

E. Size, scope, and definition of the program _ 



477 
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Ml. Description of Evaluation (3 pages) 
A. Purposes and audiences 
8. Evaluation staff and roles 
C. Design 

1. Questions addressed (includes 
standards for comparison) 

2. Constraints and questions not addressed 

0. Relationship to* past and future year^ 1 
evaluations 

IV. Program and Student Description 

A. Target students 

1. Definition of project student 

2. Student selection criteria and method 

a. Tests and cut-off scores used 

b. Role of teacher judgment 

c. Role of parent wishes 

d. Method of combining criteria 

3. Exit criteria and follow-up 
k. Student turnover 

5. Student characteristics at beginning 
of year 

a. Language proficiency 
( 1 ) Engl i sh 

i (2) NonEnglish language 

b . Ach i evemen t 1 eve 1 
• c. 3iographic data 

•• ' ,. 4: * 
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Section 



Check 
/When Done 



. Instructional Approach - a * 

"'l. Self-concept and cuttural emphasis 

2. Content of instruction 

3 Presentation of content 

a. instructional model or theory 

b. methodologies for, bilingual 
education 

c. specffic methodologies for each \_ 
subject area 

d. role of presentation 

s. self concept development 
and motivation 

J f. materials 

a. personal role in classrooms 

k. Scheduling 
C. Program Management 

1 . staff organization 

2. staff roles 

a- Project Oi rector 
d . Teachers 

« 

C. Aides 

d. Other staff 4 

3. staff development 

a. needs assessment 

b. structure of training 

c. * characteristics of training 

d. audiences trained 

k. parents and community 
5 # commun i cat ion 

6. dissemination of project information 
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Check 

Section When Done 



VII. - Parent Involvement Component 

A. Goald and objectives 

B. Description of activities to be 
evaluated 

C. Evaluation procedures 

1 . Measures used 

2. Data collection procedure 

3. Analysis procedures 

D. Evaluation Outcomes 

1, Results (including unanticipated 
outcomes) 

2. Interpretations 

VIII. Staff Development 

A. Goals and objectives 

8. Description of activities to be 
eva luated 

C. Evaluation Procedure's 

1. Measures used 

2. Data collection procedures 

3. Analysis procedures 

D. Evaluation 

1. Results (including unanticipated 
outcomes) 

2. Interpretation 

V, Student Effects 

A. English language, component 
1. Goals and objectives 
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Check 

Sect i on When Done 

V. Continued 

2. Evaluation procedures — 

a* Measures used „ , 

b. Data' col lection procedures m 

c. Analysis procedures 

3, Evaluation Outcomes 

a* Results (inducing unanticipated 

outcomes ) — 

b. Interpretat ion 

c* Recommendations - , 

B. NonEnglish language component . 

1 . Goals and objectives 

2. Evaluation procedures ^ , — 

a. Measures used 

b. Data collection procedures 

c". Analysis procedures — 

3. Evaluation Outcomes J . . , 

a. Results (including unanticipated, 

outcomes) , , . 

b. Interpretation , — — 

c. Recommendations — — ■■ 

C. Non language academic component _ 

1. Goals and objectives ' 

2. Evaluation- Procedures 

a. Measures used . — 

b. Data col lection procedures ' 

c. Analys is ^procedures _ 

I. Evaluation Outcomes 

a. Results (including. unanticipated 

outcomes) - * 

b* Interpretation 

c. Recommendations ini • — 
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Check 

Sect Ion When Done 

VI. Continued 

0. Nonacademic component ' 

1. Goals and objectives - _____ ____ 

2. Evaluation procedures . _____ 

a. Measures used * ____ _____ 

b. Oata collection procedures _____ _____ 

c. Analysis procedures _ ___ 

3. Evaluation Outcomes a _____ 

a* Results (including unanticipated 

outcomes) _____ _____ 

b. I nterpretat ion ___ 

Recommendations _____ _____ 

A. Program Operati ons _____ 

1. Instructional, approach _____ _____ 

2. Program management * _____ ; 

B. Parent involvement _____ ! 
C> Staff Oevelooment _____ , 

D. Student Effects _____ _____ 

VII. Program Operations Evaluation _____ 

A. Instructional Approach ____ 

1. Goals and objectives _____ _____ 

2. Description of activities to be 

eva 1 uated _____ ____ 

3. Evaluation procedures _____ ___ 

a. Measures used _____ 

b. Oata collection procedures 0 _____ 

c. Analysis p 

k. Evaluation outcomes _____ _____ 

a. Results 

b. ^Interpretations . * 
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Check 

Section When Done 



B. Program Management 

1. Goals and objectives 

2. Description of activities to be 
evaluated 

3. Evaluation procedures 

a. Measures used 

b. Oata collection procedures 

c. Analysis , 

4. Evaluation Outcomes 

a. Results 

b. Interpretations 
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PROGRAM INFORMATION ACQUISITION FORM 



Type of Information 



Should it If Yes 

Available Be Done? When? 
Instruments (Yes, or No) (List Date) 



A. Program Overview 

1. Grades and number of classrooms 
served 

4 

2. Portion of day covered 

3. Definition of program - 
maintenance, transitional, 
partial bilingual 

B. , Instructional Approach 

1. Self concept and cultural 
emphasis 

- - 2* Content of instruction 

a. Content areas covered 

b. Who determines content 

c. Other content features 

(1) Relationship of content 
to goals 

(2) Articulation of project 
content with existing 
district curriculum 

3. Presentation of content 



w n 
w n 



Proposal 
w #8 



W #9 
W #9 
W #9 
W #9 
W #9 

W #9 

W #9 
W #9 



W #9 



ERIC 



a. Instructional model or theory W #9 

(1) Type, e.g., concurrent, 
alternate day/week, 
preview-review, half 
day, resource room, 
and/or bilingual aide 

(2) Organizational, practices, 
e.g. , individual ized, 
large group, learning 
centers, peer tutoring, 
small group instruction, 
an376r~Team teaching" 
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Type of Information 



Aval lable 
Instruments 



Should it 
Be Done? 
(Yes or No) 



If Yes 
When? 
(List Date) 



ERLC 



b. Methodologies for bilingual 
e^ucat ion 

W #9 

(1) Language of instruction W #9 

(a) General language use 
pla., of teacher and 
student over 

length of project W 4 9 

(b) Daily instructional 

time in each language W #9 

(c) Variations for 
different student 

groups W # 9 

(d) Criteria for estab- 
lishing language of 
instruction W # 9 

(2) Approach to nonstandard 

forms W # 9 

(a) Acceptance W # 9 

(b) Form of corrections W #9 

(3) Approach to second 

language instruction W #9 

(a) Formal instruction W # 9 

(b) Functional use of 

second language for 

content • instruction 

and other activities^ W ?9 

{k) Approach to reading 

instruction W ffS 

(a) Language in which 
students learn to 

read W#9 

(b) Criteria for beginning 
reading in second 

language W #9 

^ . — - — . — i 
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Should it If Yes 

Avai lable Be Done? When? 
Tvoe of Information Instruments (Yes or No) (List Date) 

* c. Specific methodologies for 

each subject area w * ° J 



d. Rate of presentation 



W #11 



(1) Variation in pace of 

i nstruct ion for u 
I ndi v i dua 1 s or groups w ff 

(2) Time on task w #H 

(a) Minutes per day per 
content area (see 
scheduling, 5-b.) w #u 

,,(b) Proportion of time 
student is actively 
engaged i n p roduc i ng 
responses for which 
s/he gets feedback 

e. Self-Concept Development and 
Motivation (aspects of program 
that may motivate students and W tr? 
improve their self-concept w * 

(1) Appropriate content and * w 
language of instruction w ff 

(a) Using for instruc- w ^ 
tion w * 

(b) Accepting the lang- W #9 



uage of student 



W #n 



"(c) Content that relates ~~ 

to experience of w *9 

students w 1 1 

(d) Culturally relevant w *9 

material W ff 1 1 

(2) Improved affective climate W #9^ 

(a) Placing equal value 
on both languages 

and cultures w r " 



(b) Insuring student 
success 



w #9 



© (c) Involving parents W #9 
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Type of Information 



' Should it If Yes 

Available Be Done? When? • 

Instruments (Yes -or No) (List Date) 



(3) Discipline approach 

(a) Philosophy 

(b) Guidelines approach 
to control 

(C) Special reward systems, 
e.g. , prizes and 
privi leges 

f. Materials 

(1) Core materials in use 

(a) Commercial 

(b) Local ly developed 

(2) Appropriateness 

(a) Linguistic 

(b) Cultural 

g. Personnel roles in Classroom 

(1) Teachers 

(2) Aides 

* (3) Parents 
(k) Peers 

(5) Resource staff 
4. Scheduling 

a. Grouping and regrouping 

(1) Across classes 

(2) Within classes 

b. Daily schedules 



W 0 
W *9 



W #9 



W #9 
W #9 
W #9 
W #9 
W # 9 
W #9 
W # 9 
W # 9 
W # 9 
W # 9 

w # n 

W # 9 
W # 11 
W # 9 
W # 11 



W # 9 
W # 11 
W # 9 
W # 11 
W I 8 

W f 8 

W #, 8 

W # 8 

W i 8 
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Type of Information 



Should it If Yes 

Available Be Oone? When? 
Instruments (Yes or No) (List Date) 



C. Management 

1 . Staff Organization 

a. List of staff members 
and time commitment 

b. Organizational structure 

c. Qualifications 

d. Selection procedures 

2. Staf. Roles (describe 
responsibilities) 

a. Project Director / 

(1) Style of leadership 
as determined by 
project and LEA 

(2) Funds and budgets 

(3) Public relations 



(J») Administration 



(5) Overseeing instruction 



(6) Staff training 



(7) Developing and ordering 
materials and equipment 

(8) Staff recruiting and 
hi ring 



U #7 
W #8 



W # 7 
W #8 

W n 

W #8 

W #8 

W #8 

W #8 
W #9 

W #8 
W #9 

w n 
w n 

w #9 

w n 
w us 

w # 8 
w #9- 

w IB 
W *9 

W #8 
W #9 

W #8 
W #9 

W *8 
W #9 



488 

II 1-205 



Page 6 of 7 



Type of Information 



Should it If Yes 
Available Be Oone? When? 
Instruments (Yes or No) (List Date) 



b. Teachers 

(1) Planning instruction 

(2) Implementing Instruction 

(3) Non- i ns t ruct i ona 1 
respons ibi 1 i ties 

c. Aides 

'd. Other staff 

(1) Instructional coord- 
inator 

(2) Community coordinator 

(3) Evaluator 

3. Staff Development (Describe) 

a. Needs assessment 

b. Structure of training 

(1) Pre-service 

(2) In-service 

- c. Characteristics of Training 

(1) Appropriateness for staff 
of differing levels of 
knowledge and experience 

(2) Practicality 

(3) Coordination with degree 
programs 

(4) Integration with other 

d. Audiences Trained 

. (1) Project staff included 

(2) Inclusion of non-project 
staff 



W #£8 
W # 8 
W # 8 

W # 8 
W # 8 
'W # 8 

W # 8 
W » 8 
W # 8 

W g 8 
W #8 
W. & 8 
W #8 



w $ 8 
W * 8 

W # 8 
W * 8 

Wi8 

W ?8 
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Type of Information 



Avai lable 
I nstruments 



Should it 
Be Done? .. 
(Yes or No) 



If Yes 
When? 

(List Date) 



k. Parents and Community 

a. Parent involvement in 
school affairs 

b. Community input in program 
planning, e.g. , through 
advisory group 

c. Community support for 
project 

d. Parent education 



e. Parent conferences/counseling 

5. Communication 

a. Staff relations 

b. Relations with nonproject 
staff 

(1) District administrators 

(2) Principals 

(3) Nonproject teachers 
(10 School board 

6. Di sseminat ion of project in- 
formation 

a. School personnel 

b. Parents and community 



W ? 8 
W 4L15 
W # 8 
W # 15 



W J 8 

w i 15 

w 4 8 
w § 15 

w » 8 
w $ 15 

W 4 8 
w # 15 

W # 8 

U # 8 

w ; 8 
w # 8 

w # 8 

w # 8 

w # 8 

w #8 

w I 8 
U 9 15 

W#8 " 

w * 8 
w i 15 
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